This is an automated email from the ASF dual-hosted git repository. kturner pushed a commit to branch asf-site in repository https://gitbox.apache.org/repos/asf/accumulo-website.git
The following commit(s) were added to refs/heads/asf-site by this push: new e29fede Jekyll build from master:e2c23d8 e29fede is described below commit e29fede55a49559d45048d63dcae4695dbe3253c Author: Keith Turner <ktur...@apache.org> AuthorDate: Wed Sep 11 09:10:26 2019 -0400 Jekyll build from master:e2c23d8 Add blog post about storing Accumulo data in S3 (#192) --- blog/2019/09/10/accumulo-S3-notes.html | 302 +++++++++++++++++++++++++++++++++ feed.xml | 210 +++++++++++++++-------- index.html | 14 +- news/index.html | 7 + redirects.json | 2 +- search_data.json | 8 + 6 files changed, 468 insertions(+), 75 deletions(-) diff --git a/blog/2019/09/10/accumulo-S3-notes.html b/blog/2019/09/10/accumulo-S3-notes.html new file mode 100644 index 0000000..cb536ed --- /dev/null +++ b/blog/2019/09/10/accumulo-S3-notes.html @@ -0,0 +1,302 @@ +<!DOCTYPE html> +<html lang="en"> +<head> +<!-- + Licensed to the Apache Software Foundation (ASF) under one or more + contributor license agreements. See the NOTICE file distributed with + this work for additional information regarding copyright ownership. + The ASF licenses this file to You under the Apache License, Version 2.0 + (the "License"); you may not use this file except in compliance with + the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. +--> +<meta charset="utf-8"> +<meta http-equiv="X-UA-Compatible" content="IE=edge"> +<meta name="viewport" content="width=device-width, initial-scale=1"> +<link href="https://maxcdn.bootstrapcdn.com/bootswatch/3.3.7/paper/bootstrap.min.css" rel="stylesheet" integrity="sha384-awusxf8AUojygHf2+joICySzB780jVvQaVCAt1clU3QsyAitLGul28Qxb2r1e5g+" crossorigin="anonymous"> +<link href="//netdna.bootstrapcdn.com/font-awesome/4.0.3/css/font-awesome.css" rel="stylesheet"> +<link rel="stylesheet" type="text/css" href="https://cdn.datatables.net/v/bs/jq-2.2.3/dt-1.10.12/datatables.min.css"> +<link href="/css/accumulo.css" rel="stylesheet" type="text/css"> + +<title>Using S3 as a data store for Accumulo</title> + +<script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/2.2.4/jquery.min.js" integrity="sha256-BbhdlvQf/xTY9gja0Dq3HiwQF8LaCRTXxZKRutelT44=" crossorigin="anonymous"></script> +<script src="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.7/js/bootstrap.min.js" integrity="sha384-Tc5IQib027qvyjSMfHjOMaLkfuWVxZxUPnCJA7l2mCWNIpG9mGCD8wGNIcPD7Txa" crossorigin="anonymous"></script> +<script type="text/javascript" src="https://cdn.datatables.net/v/bs/jq-2.2.3/dt-1.10.12/datatables.min.js"></script> +<script> + // show location of canonical site if not currently on the canonical site + $(function() { + var host = window.location.host; + if (typeof host !== 'undefined' && host !== 'accumulo.apache.org') { + $('#non-canonical').show(); + } + }); + + $(function() { + // decorate section headers with anchors + return $("h2, h3, h4, h5, h6").each(function(i, el) { + var $el, icon, id; + $el = $(el); + id = $el.attr('id'); + icon = '<i class="fa fa-link"></i>'; + if (id) { + return $el.append($("<a />").addClass("header-link").attr("href", "#" + id).html(icon)); + } + }); + }); + + // fix sidebar width in documentation + $(function() { + var $affixElement = $('div[data-spy="affix"]'); + $affixElement.width($affixElement.parent().width()); + }); +</script> + +</head> +<body style="padding-top: 100px"> + + <nav class="navbar navbar-default navbar-fixed-top"> + <div class="container"> + <div class="navbar-header"> + <button type="button" class="navbar-toggle" data-toggle="collapse" data-target="#navbar-items"> + <span class="sr-only">Toggle navigation</span> + <span class="icon-bar"></span> + <span class="icon-bar"></span> + <span class="icon-bar"></span> + </button> + <a href="/"><img id="nav-logo" alt="Apache Accumulo" class="img-responsive" src="/images/accumulo-logo.png" width="200" + /></a> + </div> + <div class="collapse navbar-collapse" id="navbar-items"> + <ul class="nav navbar-nav"> + <li class="nav-link"><a href="/downloads">Download</a></li> + <li class="nav-link"><a href="/tour">Tour</a></li> + <li class="dropdown"> + <a class="dropdown-toggle" data-toggle="dropdown" href="#">Releases<span class="caret"></span></a> + <ul class="dropdown-menu"> + <li><a href="/release/accumulo-2.0.0/">2.0.0 (Latest)</a></li> + <li><a href="/release/accumulo-1.9.3/">1.9.3</a></li> + <li><a href="/release/">Archive</a></li> + </ul> + </li> + <li class="dropdown"> + <a class="dropdown-toggle" data-toggle="dropdown" href="#">Documentation<span class="caret"></span></a> + <ul class="dropdown-menu"> + <li><a href="/docs/2.x">User Manual (2.x)</a></li> + <li><a href="/quickstart-1.x">Quickstart (1.x)</a></li> + <li><a href="/accumulo2-maven-plugin">Accumulo Maven Plugin</a></li> + <li><a href="/1.9/accumulo_user_manual.html">User Manual (1.9)</a></li> + <li><a href="/1.9/apidocs">Javadocs (1.9)</a></li> + <li><a href="/external-docs">External Docs</a></li> + <li><a href="/docs-archive/">Archive</a></li> + </ul> + </li> + <li class="dropdown"> + <a class="dropdown-toggle" data-toggle="dropdown" href="#">Community<span class="caret"></span></a> + <ul class="dropdown-menu"> + <li><a href="/contact-us">Contact Us</a></li> + <li><a href="/how-to-contribute">How To Contribute</a></li> + <li><a href="/people">People</a></li> + <li><a href="/related-projects">Related Projects</a></li> + </ul> + </li> + <li class="nav-link"><a href="/search">Search</a></li> + </ul> + <ul class="nav navbar-nav navbar-right"> + <li class="dropdown"> + <a class="dropdown-toggle" data-toggle="dropdown" href="#"><img alt="Apache Software Foundation" src="https://www.apache.org/foundation/press/kit/feather.svg" width="15"/><span class="caret"></span></a> + <ul class="dropdown-menu"> + <li><a href="https://www.apache.org">Apache Homepage <i class="fa fa-external-link"></i></a></li> + <li><a href="https://www.apache.org/licenses/">License <i class="fa fa-external-link"></i></a></li> + <li><a href="https://www.apache.org/foundation/sponsorship">Sponsorship <i class="fa fa-external-link"></i></a></li> + <li><a href="https://www.apache.org/security">Security <i class="fa fa-external-link"></i></a></li> + <li><a href="https://www.apache.org/foundation/thanks">Thanks <i class="fa fa-external-link"></i></a></li> + <li><a href="https://www.apache.org/foundation/policies/conduct">Code of Conduct <i class="fa fa-external-link"></i></a></li> + <li><a href="https://www.apache.org/events/current-event.html">Current Event <i class="fa fa-external-link"></i></a></li> + </ul> + </li> + </ul> + </div> + </div> +</nav> + + + <div class="container"> + <div class="row"> + <div class="col-md-12"> + + <div id="non-canonical" style="display: none; background-color: #F0E68C; padding-left: 1em;"> + Visit the official site at: <a href="https://accumulo.apache.org">https://accumulo.apache.org</a> + </div> + <div id="content"> + + <h1 class="title">Using S3 as a data store for Accumulo</h1> + + <p> +<b>Author: </b> Keith Turner<br> +<b>Date: </b> 10 Sep 2019<br> + +</p> + +<p>Accumulo can store its files in S3, however S3 does not support the needs of +write ahead logs and the Accumulo metadata table. One way to solve this problem +is to store the metadata table and write ahead logs in HDFS and everything else +in S3. This post shows how to do that using Accumulo 2.0 and Hadoop 3.2.0. +Running on S3 requires a new feature in Accumulo 2.0, that volume choosers are +aware of write ahead logs.</p> + +<h2 id="hadoop-setup">Hadoop setup</h2> + +<p>At least the following settings should be added to Hadoop’s <code class="highlighter-rouge">core-site.xml</code> file on each node in the cluster.</p> + +<div class="language-xml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nt"><property></span> + <span class="nt"><name></span>fs.s3a.access.key<span class="nt"></name></span> + <span class="nt"><value></span>KEY<span class="nt"></value></span> +<span class="nt"></property></span> +<span class="nt"><property></span> + <span class="nt"><name></span>fs.s3a.secret.key<span class="nt"></name></span> + <span class="nt"><value></span>SECRET<span class="nt"></value></span> +<span class="nt"></property></span> +<span class="c"><!-- without this setting Accumulo tservers would have problems when trying to open lots of files --></span> +<span class="nt"><property></span> + <span class="nt"><name></span>fs.s3a.connection.maximum<span class="nt"></name></span> + <span class="nt"><value></span>128<span class="nt"></value></span> +<span class="nt"></property></span> +</code></pre></div></div> + +<p>See <a href="https://hadoop.apache.org/docs/current/hadoop-aws/tools/hadoop-aws/index.html#S3A">S3A docs</a> +for more S3A settings. To get hadoop command to work with s3 set <code class="highlighter-rouge">export +HADOOP_OPTIONAL_TOOLS="hadoop-aws"</code> in <code class="highlighter-rouge">hadoop-env.sh</code>.</p> + +<p>When trying to use Accumulo with Hadoop’s AWS jar <a href="https://issues.apache.org/jira/browse/HADOOP-16080">HADOOP-16080</a> was +encountered. The following instructions build a relocated hadoop-aws jar as a +work around. After building the jar copy it to all nodes in the cluster.</p> + +<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">mkdir</span> <span class="nt">-p</span> /tmp/haws-reloc +<span class="nb">cd</span> /tmp/haws-reloc +<span class="c"># get the Maven pom file that builds a relocated jar</span> +wget https://gist.githubusercontent.com/keith-turner/f6dcbd33342732e42695d66509239983/raw/714cb801eb49084e0ceef5c6eb4027334fd51f87/pom.xml +mvn package <span class="nt">-Dhadoop</span>.version<span class="o">=</span><your hadoop version> +<span class="c"># the new jar will be in target</span> +<span class="nb">ls </span>target/ +</code></pre></div></div> + +<h2 id="accumulo-setup">Accumulo setup</h2> + +<p>For each node in the cluster, modify <code class="highlighter-rouge">accumulo-env.sh</code> to add S3 jars to the +classpath. Your versions may differ depending on your Hadoop version, +following versions were included with Hadoop 3.2.0.</p> + +<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">CLASSPATH</span><span class="o">=</span><span class="s2">"</span><span class="k">${</span><span class="nv">conf</span><span class="k">}</span><span class="s2">:</span><span class="k">${</span><span class="nv">lib</span><span class="k">}</span><span class="s2">/*:</span><span class="k">${</span><span class="nv">HADOOP_CONF_DIR</span><span class="k">}</span><span class="s2">:</ [...] +<span class="nv">CLASSPATH</span><span class="o">=</span><span class="s2">"</span><span class="k">${</span><span class="nv">CLASSPATH</span><span class="k">}</span><span class="s2">:/somedir/hadoop-aws-relocated.3.2.0.jar"</span> +<span class="nv">CLASSPATH</span><span class="o">=</span><span class="s2">"</span><span class="k">${</span><span class="nv">CLASSPATH</span><span class="k">}</span><span class="s2">:</span><span class="k">${</span><span class="nv">HADOOP_HOME</span><span class="k">}</span><span class="s2">/share/hadoop/tools/lib/aws-java-sdk-bundle-1.11.375.jar"</span> +<span class="c"># The following are dependencies needed by by the previous jars and are subject to change</span> +<span class="nv">CLASSPATH</span><span class="o">=</span><span class="s2">"</span><span class="k">${</span><span class="nv">CLASSPATH</span><span class="k">}</span><span class="s2">:</span><span class="k">${</span><span class="nv">HADOOP_HOME</span><span class="k">}</span><span class="s2">/share/hadoop/common/lib/jaxb-api-2.2.11.jar"</span> +<span class="nv">CLASSPATH</span><span class="o">=</span><span class="s2">"</span><span class="k">${</span><span class="nv">CLASSPATH</span><span class="k">}</span><span class="s2">:</span><span class="k">${</span><span class="nv">HADOOP_HOME</span><span class="k">}</span><span class="s2">/share/hadoop/common/lib/jaxb-impl-2.2.3-1.jar"</span> +<span class="nv">CLASSPATH</span><span class="o">=</span><span class="s2">"</span><span class="k">${</span><span class="nv">CLASSPATH</span><span class="k">}</span><span class="s2">:</span><span class="k">${</span><span class="nv">HADOOP_HOME</span><span class="k">}</span><span class="s2">/share/hadoop/common/lib/commons-lang3-3.7jar"</span> +<span class="nb">export </span>CLASSPATH +</code></pre></div></div> + +<p>Set the following in <code class="highlighter-rouge">accumulo.properties</code> and then run <code class="highlighter-rouge">accumulo init</code>, but don’t start Accumulo.</p> + +<div class="language-ini highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="py">instance.volumes</span><span class="p">=</span><span class="s">hdfs://<name node>/accumulo</span> +</code></pre></div></div> + +<p>After running Accumulo init we need to configure storing write ahead logs in +HDFS. Set the following in <code class="highlighter-rouge">accumulo.properties</code>.</p> + +<div class="language-ini highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="py">instance.volumes</span><span class="p">=</span><span class="s">hdfs://<name node>/accumulo,s3a://<bucket>/accumulo</span> +<span class="py">general.volume.chooser</span><span class="p">=</span><span class="s">org.apache.accumulo.server.fs.PreferredVolumeChooser</span> +<span class="py">general.custom.volume.preferred.default</span><span class="p">=</span><span class="s">s3a://<bucket>/accumulo</span> +<span class="py">general.custom.volume.preferred.logger</span><span class="p">=</span><span class="s">hdfs://<namenode>/accumulo</span> + +</code></pre></div></div> + +<p>Run <code class="highlighter-rouge">accumulo init --add-volumes</code> to initialize the S3 volume. Doing this +in two steps avoids putting any Accumulo metadata files in S3 during init. +Copy <code class="highlighter-rouge">accumulo.properties</code> to all nodes and start Accumulo.</p> + +<p>Individual tables can be configured to store their files in HDFS by setting the +table property <code class="highlighter-rouge">table.custom.volume.preferred</code>. This should be set for the +metadata table in case it splits using the following Accumulo shell command.</p> + +<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>config -t accumulo.metadata -s table.custom.volume.preferred=hdfs://<namenode>/accumulo +</code></pre></div></div> + +<h2 id="accumulo-example">Accumulo example</h2> + +<p>The following Accumulo shell session shows an example of writing data to S3 and +reading it back. It also shows scanning the metadata table to verify the data +is stored in S3.</p> + +<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>root@muchos> createtable s3test +root@muchos s3test> insert r1 f1 q1 v1 +root@muchos s3test> insert r1 f1 q2 v2 +root@muchos s3test> flush -w +2019-09-10 19:39:04,695 [shell.Shell] INFO : Flush of table s3test completed. +root@muchos s3test> scan +r1 f1:q1 [] v1 +r1 f1:q2 [] v2 +root@muchos s3test> scan -t accumulo.metadata -c file +2< file:s3a://<bucket>/accumulo/tables/2/default_tablet/F000007b.rf [] 234,2 +</code></pre></div></div> + +<p>These instructions were only tested a few times and may not result in a stable +system. I have <a href="https://gist.github.com/keith-turner/149f35f218d10e13227461714012d7bf">run</a> a 24hr test with Accumulo and S3.</p> + +<h2 id="is-s3guard-needed">Is S3Guard needed?</h2> + +<p>I am not completely certain about this, but I don’t think S3Guard is needed for +regular Accumulo tables. There are two reasons I think this is so. First each +Accumulo user tablet stores its list of files in the metadata table using +absolute URIs. This allows a tablet to have files on multiple DFS instances. +Therefore Accumulo never does a DFS list operation to get a tablets files, it +always uses whats in the metadata table. Second, Accumulo gives each file a +unique name using a counter stored in Zookeeper and file names are never +reused.</p> + +<p>Things are sligthly different for Accumulo’s metadata. User tablets store +their file list in the metadata table. Metadata tablets store their file list +in the root table. The root table stores its file list in DFS. Therefore it +would be dangerous to place the root tablet in S3 w/o using S3Guard. That is +why these instructions place Accumulo metadata in HDFS. <strong>Hopefully</strong> this +configuration allows the system to be consistent w/o using S3Guard.</p> + +<p>When Accumulo 2.1.0 is released with the changes made by <a href="https://github.com/apache/accumulo/issues/1313">#1313 </a> for issue +<a href="https://github.com/apache/accumulo/issues/936">#936 </a>, it may be possible to store the metadata table in S3 w/o +S3Gaurd. If this is the case then only the write ahead logs would need to be +stored in HDFS.</p> + + + +<p><strong>View all posts in the <a href="/news">news archive</a></strong></p> + + </div> + + +<footer> + + <p><a href="https://www.apache.org/foundation/contributing"><img src="https://www.apache.org/images/SupportApache-small.png" alt="Support the ASF" id="asf-logo" height="100" /></a></p> + + <p>Copyright © 2011-2019 <a href="https://www.apache.org">The Apache Software Foundation</a>. +Licensed under the <a href="https://www.apache.org/licenses/">Apache License, Version 2.0</a>.</p> + + <p>Apache®, the names of Apache projects and their logos, and the multicolor feather +logo are registered trademarks or trademarks of The Apache Software Foundation +in the United States and/or other countries.</p> + +</footer> + + + </div> + </div> + </div> +</body> +</html> diff --git a/feed.xml b/feed.xml index 7e25515..fe51dbd 100644 --- a/feed.xml +++ b/feed.xml @@ -6,12 +6,153 @@ </description> <link>https://accumulo.apache.org/</link> <atom:link href="https://accumulo.apache.org/feed.xml" rel="self" type="application/rss+xml"/> - <pubDate>Tue, 13 Aug 2019 09:44:36 -0400</pubDate> - <lastBuildDate>Tue, 13 Aug 2019 09:44:36 -0400</lastBuildDate> + <pubDate>Wed, 11 Sep 2019 09:10:18 -0400</pubDate> + <lastBuildDate>Wed, 11 Sep 2019 09:10:18 -0400</lastBuildDate> <generator>Jekyll v3.8.6</generator> <item> + <title>Using S3 as a data store for Accumulo</title> + <description><p>Accumulo can store its files in S3, however S3 does not support the needs of +write ahead logs and the Accumulo metadata table. One way to solve this problem +is to store the metadata table and write ahead logs in HDFS and everything else +in S3. This post shows how to do that using Accumulo 2.0 and Hadoop 3.2.0. +Running on S3 requires a new feature in Accumulo 2.0, that volume choosers are +aware of write ahead logs.</p> + +<h2 id="hadoop-setup">Hadoop setup</h2> + +<p>At least the following settings should be added to Hadoop’s <code class="highlighter-rouge">core-site.xml</code> file on each node in the cluster.</p> + +<div class="language-xml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nt">&lt;property&gt;</span> + <span class="nt">&lt;name&gt;</span>fs.s3a.access.key<span class="nt">&lt;/name&gt;</span> + <span class="nt">&lt;value&gt;</span>KEY<span class="nt">&lt;/value&gt;</span> +<span class="nt">&lt;/property&gt;</span> +<span class="nt">&lt;property&gt;</span> + <span class="nt">&lt;name&gt;</span>fs.s3a.secret.key<span class="nt">&lt;/name&gt;</span> + <span class="nt">&lt;value&gt;</span>SECRET<span class="nt">&lt;/value&gt;</span> +<span class="nt">&lt;/property&gt;</span> +<span class="c">&lt;!-- without this setting Accumulo tservers would have problems when trying to open lots of files --&gt;</span> +<span class="nt">&lt;property&gt;</span> + <span class="nt">&lt;name&gt;</span>fs.s3a.connection.maximum<span class="nt">&lt;/name&gt;</span> + <span class="nt">&lt;value&gt;</span>128<span class="nt">&lt;/value&gt;</span> +<span class="nt">&lt;/property&gt;</span> +</code></pre></div></div> + +<p>See <a href="https://hadoop.apache.org/docs/current/hadoop-aws/tools/hadoop-aws/index.html#S3A">S3A docs</a> +for more S3A settings. To get hadoop command to work with s3 set <code class="highlighter-rouge">export +HADOOP_OPTIONAL_TOOLS="hadoop-aws"</code> in <code class="highlighter-rouge">hadoop-env.sh</code>.</p> + +<p>When trying to use Accumulo with Hadoop’s AWS jar <a href="https://issues.apache.org/jira/browse/HADOOP-16080">HADOOP-16080</a> was +encountered. The following instructions build a relocated hadoop-aws jar as a +work around. After building the jar copy it to all nodes in the cluster.</p> + +<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">mkdir</span> <span class="nt">-p</span> /tmp/haws-reloc +<span class="nb">cd</span> /tmp/haws-reloc +<span class="c"># get the Maven pom file that builds a relocated jar</span> +wget https://gist.githubusercontent.com/keith-turner/f6dcbd33342732e42695d66509239983/raw/714cb801eb49084e0ceef5c6eb4027334fd51f87/pom.xml +mvn package <span class="nt">-Dhadoop</span>.version<span class="o">=</span>&lt;your hadoop version&gt; +<span class="c"># the new jar will be in target</span> +<span class="nb">ls </span>target/ +</code></pre></div></div> + +<h2 id="accumulo-setup">Accumulo setup</h2> + +<p>For each node in the cluster, modify <code class="highlighter-rouge">accumulo-env.sh</code> to add S3 jars to the +classpath. Your versions may differ depending on your Hadoop version, +following versions were included with Hadoop 3.2.0.</p> + +<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">CLASSPATH</span><span class="o">=</span><span class="s2">"</span><span class="k">${</span><span class="nv">conf</span><span class="k">}</span><span class="s2">:</span&g [...] +<span class="nv">CLASSPATH</span><span class="o">=</span><span class="s2">"</span><span class="k">${</span><span class="nv">CLASSPATH</span><span class="k">}</span><span class="s2">:/somedir/hadoop-aws-relocated.3.2.0.jar"</span> +<span class="nv">CLASSPATH</span><span class="o">=</span><span class="s2">"</span><span class="k">${</span><span class="nv">CLASSPATH</span><span class="k">}</span><span class="s2">:</span><span class="k">${</span><span class="nv">HADOOP_HOME</span><span class="k">}</sp [...] +<span class="c"># The following are dependencies needed by by the previous jars and are subject to change</span> +<span class="nv">CLASSPATH</span><span class="o">=</span><span class="s2">"</span><span class="k">${</span><span class="nv">CLASSPATH</span><span class="k">}</span><span class="s2">:</span><span class="k">${</span><span class="nv">HADOOP_HOME</span><span class="k">}</sp [...] +<span class="nv">CLASSPATH</span><span class="o">=</span><span class="s2">"</span><span class="k">${</span><span class="nv">CLASSPATH</span><span class="k">}</span><span class="s2">:</span><span class="k">${</span><span class="nv">HADOOP_HOME</span><span class="k">}</sp [...] +<span class="nv">CLASSPATH</span><span class="o">=</span><span class="s2">"</span><span class="k">${</span><span class="nv">CLASSPATH</span><span class="k">}</span><span class="s2">:</span><span class="k">${</span><span class="nv">HADOOP_HOME</span><span class="k">}</sp [...] +<span class="nb">export </span>CLASSPATH +</code></pre></div></div> + +<p>Set the following in <code class="highlighter-rouge">accumulo.properties</code> and then run <code class="highlighter-rouge">accumulo init</code>, but don’t start Accumulo.</p> + +<div class="language-ini highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="py">instance.volumes</span><span class="p">=</span><span class="s">hdfs://&lt;name node&gt;/accumulo</span> +</code></pre></div></div> + +<p>After running Accumulo init we need to configure storing write ahead logs in +HDFS. Set the following in <code class="highlighter-rouge">accumulo.properties</code>.</p> + +<div class="language-ini highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="py">instance.volumes</span><span class="p">=</span><span class="s">hdfs://&lt;name node&gt;/accumulo,s3a://&lt;bucket&gt;/accumulo</span> +<span class="py">general.volume.chooser</span><span class="p">=</span><span class="s">org.apache.accumulo.server.fs.PreferredVolumeChooser</span> +<span class="py">general.custom.volume.preferred.default</span><span class="p">=</span><span class="s">s3a://&lt;bucket&gt;/accumulo</span> +<span class="py">general.custom.volume.preferred.logger</span><span class="p">=</span><span class="s">hdfs://&lt;namenode&gt;/accumulo</span> + +</code></pre></div></div> + +<p>Run <code class="highlighter-rouge">accumulo init --add-volumes</code> to initialize the S3 volume. Doing this +in two steps avoids putting any Accumulo metadata files in S3 during init. +Copy <code class="highlighter-rouge">accumulo.properties</code> to all nodes and start Accumulo.</p> + +<p>Individual tables can be configured to store their files in HDFS by setting the +table property <code class="highlighter-rouge">table.custom.volume.preferred</code>. This should be set for the +metadata table in case it splits using the following Accumulo shell command.</p> + +<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>config -t accumulo.metadata -s table.custom.volume.preferred=hdfs://&lt;namenode&gt;/accumulo +</code></pre></div></div> + +<h2 id="accumulo-example">Accumulo example</h2> + +<p>The following Accumulo shell session shows an example of writing data to S3 and +reading it back. It also shows scanning the metadata table to verify the data +is stored in S3.</p> + +<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>root@muchos&gt; createtable s3test +root@muchos s3test&gt; insert r1 f1 q1 v1 +root@muchos s3test&gt; insert r1 f1 q2 v2 +root@muchos s3test&gt; flush -w +2019-09-10 19:39:04,695 [shell.Shell] INFO : Flush of table s3test completed. +root@muchos s3test&gt; scan +r1 f1:q1 [] v1 +r1 f1:q2 [] v2 +root@muchos s3test&gt; scan -t accumulo.metadata -c file +2&lt; file:s3a://&lt;bucket&gt;/accumulo/tables/2/default_tablet/F000007b.rf [] 234,2 +</code></pre></div></div> + +<p>These instructions were only tested a few times and may not result in a stable +system. I have <a href="https://gist.github.com/keith-turner/149f35f218d10e13227461714012d7bf">run</a> a 24hr test with Accumulo and S3.</p> + +<h2 id="is-s3guard-needed">Is S3Guard needed?</h2> + +<p>I am not completely certain about this, but I don’t think S3Guard is needed for +regular Accumulo tables. There are two reasons I think this is so. First each +Accumulo user tablet stores its list of files in the metadata table using +absolute URIs. This allows a tablet to have files on multiple DFS instances. +Therefore Accumulo never does a DFS list operation to get a tablets files, it +always uses whats in the metadata table. Second, Accumulo gives each file a +unique name using a counter stored in Zookeeper and file names are never +reused.</p> + +<p>Things are sligthly different for Accumulo’s metadata. User tablets store +their file list in the metadata table. Metadata tablets store their file list +in the root table. The root table stores its file list in DFS. Therefore it +would be dangerous to place the root tablet in S3 w/o using S3Guard. That is +why these instructions place Accumulo metadata in HDFS. <strong>Hopefully</strong> this +configuration allows the system to be consistent w/o using S3Guard.</p> + +<p>When Accumulo 2.1.0 is released with the changes made by <a href="https://github.com/apache/accumulo/issues/1313">#1313 </a> for issue +<a href="https://github.com/apache/accumulo/issues/936">#936 </a>, it may be possible to store the metadata table in S3 w/o +S3Gaurd. If this is the case then only the write ahead logs would need to be +stored in HDFS.</p> + +</description> + <pubDate>Tue, 10 Sep 2019 00:00:00 -0400</pubDate> + <link>https://accumulo.apache.org/blog/2019/09/10/accumulo-S3-notes.html</link> + <guid isPermaLink="true">https://accumulo.apache.org/blog/2019/09/10/accumulo-S3-notes.html</guid> + + + <category>blog</category> + + </item> + + <item> <title>Top 10 Reasons to Upgrade</title> <description><p>Accumulo 2.0 has been in development for quite some time now and is packed with new features, bug fixes, performance improvements and redesigned components. All of these changes bring challenges @@ -769,70 +910,5 @@ flushed, compacted, scanned, and deleted the table.</li> </item> - <item> - <title>Apache Accumulo 1.9.1</title> - <description><p>Apache Accumulo 1.9.1 contains bug fixes for a critical data loss bug. Users of -1.8.0, 1.8.1, or 1.9.0 are encouraged to upgrade immediately.</p> - -<ul> - <li><a href="/1.9/accumulo_user_manual.html">User Manual</a> - In-depth developer and administrator documentation</li> - <li><a href="/1.9/apidocs/">Javadocs</a> - Accumulo 1.9 API</li> - <li><a href="/1.9/examples/">Examples</a> - Code with corresponding readme files that give step by -step instructions for running example code</li> -</ul> - -<h2 id="notable-changes">Notable Changes</h2> - -<h3 id="fixes-for-critical-wal-data-loss-bugs-affects-versions-180-190">Fixes for Critical WAL Data Loss Bugs (affects versions 1.8.0-1.9.0)</h3> - -<p>Accumulo 1.9.1 contains multiple bug fixes for write ahead log recovery. Write -ahead log recovery is the process of restoring data that was in memory when a -tablet server died. These bugs could lead to data loss and are present in -1.8.0, 1.8.1, and 1.9.0. Because the bugs can negatively impact Accumulo’s -metadata table, <strong>even users that mainly use bulk import may be affected</strong>. It -is <strong>strongly</strong> recommended that anyone using 1.8.0 or greater upgrade -immediately. For more information see issues <a href="https://github.com/apache/accumulo/issues/441">#441</a> and <a href="https://github.com/apache/accumulo/issues/449">#449</a>. These issues -were fixed in pull request <a href="https://github.com/apache/accumulo/issues/443">#443</a> and <a href="https://github.com/apache/accumulo/issues/458">#458</a>.</p> - -<p>The only users who would not be affected by these bugs would be users already -running Accumulo without the recommended write-ahead logs enabled at all -(durability: none), including for the metadata tables. Such users are already -risking data loss when a server fails, but are not subject to any additional -risk from these bugs, which occur during automated recovery from such failures.</p> - -<h2 id="some-wal-recovery-files-were-not-being-properly-cleaned-up">Some WAL recovery files were not being properly cleaned up</h2> - -<p>A less serious bug than the above critical bugs was discovered and fixed, -pertaining to write-ahead log recovery. This bug involved recovery files not -being removed properly when no longer required and was fixed in <a href="https://github.com/apache/accumulo/issues/432">#432</a>.</p> - -<h2 id="other-changes">Other Changes</h2> - -<ul> - <li><a href="https://github.com/apache/accumulo/issues?q=project%3Aapache%2Faccumulo%2F5">GitHub</a> - List of issues tracked on GitHub corresponding to this release</li> - <li><a href="/release/accumulo-1.9.0/">1.9.0 release notes</a> - Release notes showing changes in the previous release, 1.9.0</li> -</ul> - -<h2 id="upgrading">Upgrading</h2> - -<p>View the <a href="/docs/2.x/administration/upgrading">Upgrading Accumulo documentation</a> for guidance.</p> - -<h2 id="testing">Testing</h2> - -<p>Continuous ingest with agitation and all integration test were run against this -version. Continuous ingest was run with 9 nodes for 24 hours followed by a -successful verification. The integration tests were run against both Hadoop -2.6.4 and Hadoop 3.0.0.</p> - -</description> - <pubDate>Mon, 14 May 2018 00:00:00 -0400</pubDate> - <link>https://accumulo.apache.org/release/accumulo-1.9.1/</link> - <guid isPermaLink="true">https://accumulo.apache.org/release/accumulo-1.9.1/</guid> - - - <category>release</category> - - </item> - </channel> </rss> diff --git a/index.html b/index.html index dbb49f4..d6538e1 100644 --- a/index.html +++ b/index.html @@ -178,6 +178,13 @@ <div class="row latest-news-item"> <div class="col-sm-12" style="margin-bottom: 5px"> + <span style="font-size: 12px; margin-right: 5px;">Sep 2019</span> + <a href="/blog/2019/09/10/accumulo-S3-notes.html">Using S3 as a data store for Accumulo</a> + </div> + </div> + + <div class="row latest-news-item"> + <div class="col-sm-12" style="margin-bottom: 5px"> <span style="font-size: 12px; margin-right: 5px;">Aug 2019</span> <a href="/blog/2019/08/12/why-upgrade.html">Top 10 Reasons to Upgrade</a> </div> @@ -204,13 +211,6 @@ </div> </div> - <div class="row latest-news-item"> - <div class="col-sm-12" style="margin-bottom: 5px"> - <span style="font-size: 12px; margin-right: 5px;">Feb 2019</span> - <a href="/blog/2019/02/28/nosql-day.html">NoSQL Day 2019</a> - </div> - </div> - <div id="news-archive-link"> <p>View all posts in the <a href="/news">news archive</a></p> </div> diff --git a/news/index.html b/news/index.html index a914e2f..b186191 100644 --- a/news/index.html +++ b/news/index.html @@ -147,6 +147,13 @@ <div class="row" style="margin-top: 15px"> + <div class="col-md-1">Sep 10</div> + <div class="col-md-10"><a href="/blog/2019/09/10/accumulo-S3-notes.html">Using S3 as a data store for Accumulo</a></div> + </div> + + + + <div class="row" style="margin-top: 15px"> <div class="col-md-1">Aug 12</div> <div class="col-md-10"><a href="/blog/2019/08/12/why-upgrade.html">Top 10 Reasons to Upgrade</a></div> </div> diff --git a/redirects.json b/redirects.json index c9a2be2..4bda3a4 100644 --- a/redirects.json +++ b/redirects.json @@ -1 +1 @@ -{"/release_notes/1.5.1.html":"https://accumulo.apache.org/release/accumulo-1.5.1/","/release_notes/1.6.0.html":"https://accumulo.apache.org/release/accumulo-1.6.0/","/release_notes/1.6.1.html":"https://accumulo.apache.org/release/accumulo-1.6.1/","/release_notes/1.6.2.html":"https://accumulo.apache.org/release/accumulo-1.6.2/","/release_notes/1.7.0.html":"https://accumulo.apache.org/release/accumulo-1.7.0/","/release_notes/1.5.3.html":"https://accumulo.apache.org/release/accumulo-1.5.3/" [...] \ No newline at end of file +{"/release_notes/1.5.1.html":"https://accumulo.apache.org/release/accumulo-1.5.1/","/release_notes/1.6.0.html":"https://accumulo.apache.org/release/accumulo-1.6.0/","/release_notes/1.6.1.html":"https://accumulo.apache.org/release/accumulo-1.6.1/","/release_notes/1.6.2.html":"https://accumulo.apache.org/release/accumulo-1.6.2/","/release_notes/1.7.0.html":"https://accumulo.apache.org/release/accumulo-1.7.0/","/release_notes/1.5.3.html":"https://accumulo.apache.org/release/accumulo-1.5.3/" [...] \ No newline at end of file diff --git a/search_data.json b/search_data.json index 2294245..289322c 100644 --- a/search_data.json +++ b/search_data.json @@ -295,6 +295,14 @@ }, + "blog-2019-09-10-accumulo-s3-notes-html": { + "title": "Using S3 as a data store for Accumulo", + "content" : "Accumulo can store its files in S3, however S3 does not support the needs ofwrite ahead logs and the Accumulo metadata table. One way to solve this problemis to store the metadata table and write ahead logs in HDFS and everything elsein S3. This post shows how to do that using Accumulo 2.0 and Hadoop 3.2.0.Running on S3 requires a new feature in Accumulo 2.0, that volume choosers areaware of write ahead logs.Hadoop setupAt least the following settings should be added [...] + "url": " /blog/2019/09/10/accumulo-S3-notes.html", + "categories": "blog" + } + , + "blog-2019-08-12-why-upgrade-html": { "title": "Top 10 Reasons to Upgrade", "content" : "Accumulo 2.0 has been in development for quite some time now and is packed with new features, bugfixes, performance improvements and redesigned components. All of these changes bring challengeswhen upgrading your production cluster so you may be wondering… why should I upgrade?My top 10 reasons to upgrade. For all changes see the release notes Summaries New Bulk Import Simplified Scripts and Config New Monitor New APIs Offline creation Search Documentation On [...]