Disclaimer,
Not advocating this is the best approach, just what I’m currently doing, put
this together pretty quick, but it should be mostly complete for settting up
accumulo on cdh hdfs/zk
I always do something like this first on CentOS:
$ yum install –y ntpd openssh-clients unzip
#setup ssh and ntpd as needed
$install the jdk RPM
# bash this to setup OS specifics
echo "Disabling SELINUX for Optimal CDH Compatability..."
sed -i 's/SELINUX=enforcing/SELINUX=disabled/g' /etc/selinux/config
echo "Increasing uLimit, aka File Descripter/Handlers for all Users..."
echo "# Adding Support for CDH" >> /etc/security/limits.conf
echo "* - nofile 65536" >>
/etc/security/limits.conf
echo "Disabling IPv6..."
echo "# Disable ipv6" >> /etc/sysctl.conf
echo "net.ipv6.conf.all.disable_ipv6 = 1" >> /etc/sysctl.conf
echo "net.ipv6.conf.default.disable_ipv6 = 1" >> /etc/sysctl.conf
echo "Increasing Swapiness Factor to limit use of swap space."
echo "# swappiness for accumulo" >> /etc/sysctl.conf
echo "vm.swappiness = 10" >> /etc/sysctl.conf
reboot and test OS/services/jdk version…
then I usually extract Accumulo to /opt/Accumulo/accumulo-1.5.0
Make a sym link, /opt/accumulo/Accumulo-current -> ./accumulo-1.5.0
#make dirs. For Accumulo logs, where ever…
mkdir /var/log/accumulo
#let HDFS own all your Accumulo folders
chown –R hdfs:hdfs /opt/accumulo
chown –R hdfs:hdfs /var/log/accumulo
#update the hdfs password for the next step
user root: passwd hdfs
#setup passwordless ssh (test using hdfs afterwards, should be able to ssh
<node> w/o entering credentials)
su –hdfs
ssh-copy-id <for all tablet server nodes>
#update your iptables
#env vars
ACCUMULO_HOME=/opt/accumulo/accumulo-1.5.0
JAVA_HOME=/usr/java/default (jdk7 in my last install worked fine)
Settings for accumulo-env.sh in /conf:
# cdh4
export HADOOP_HDFS_HOME=/opt/cloudera/parcels/CDH/lib/hadoop-hdfs
export HADOOP_MAPREDUCE_HOME=/opt/cloudera/parcels/CDH/lib/hadoop-mapreduce
test -z "$HADOOP_CONF_DIR" && export
HADOOP_CONF_DIR="$HADOOP_PREFIX/etc/hadoop"
test -z "$JAVA_HOME" && export JAVA_HOME=/usr/java/default
test -z "$ZOOKEEPER_HOME" && export
ZOOKEEPER_HOME=/opt/cloudera/parcels/CDH/lib/zookeeper
test -z "$ACCUMULO_LOG_DIR" && export ACCUMULO_LOG_DIR=$ACCUMULO_HOME/logs
#update all files as appropriate in /opt/Accumulo/Accumulo-current/conf/*
masters, monitor,slaves,tracers,gc,Accumulo-site.xml, Accumulo-env.sh
#accumulo-site.xml
<property>
<name>general.classpaths</name>
<value>
$ACCUMULO_HOME/server/target/classes/,
$ACCUMULO_HOME/lib/accumulo-server.jar,
$ACCUMULO_HOME/core/target/classes/,
$ACCUMULO_HOME/lib/accumulo-core.jar,
$ACCUMULO_HOME/start/target/classes/,
$ACCUMULO_HOME/lib/accumulo-start.jar,
$ACCUMULO_HOME/fate/target/classes/,
$ACCUMULO_HOME/lib/accumulo-fate.jar,
$ACCUMULO_HOME/proxy/target/classes/,
$ACCUMULO_HOME/lib/accumulo-proxy.jar,
$ACCUMULO_HOME/lib/[^.].*.jar,
$ZOOKEEPER_HOME/zookeeper[^.].*.jar,
$HADOOP_CONF_DIR,
$HADOOP_PREFIX/[^.].*.jar,
$HADOOP_PREFIX/lib/[^.].*.jar,
$HADOOP_HDFS_HOME/.*.jar,
$HADOOP_HDFS_HOME/lib/.*.jar,
$HADOOP_MAPREDUCE_HOME/.*.jar,
$HADOOP_MAPREDUCE_HOME/lib/.*.jar
</value>
<description>Classpaths that accumulo checks for updates and class files.
When using the Security Manager, please remove the ".../target/classes/"
values.
</description>
</property>
then of course, always run your Accumulo binaries/scripts using the HDFS
account. I’m sure I’m missing a few steps here and there…
$ACCUMULO_HOME/bin/accumulo init
…
$ACCUMULO_HOME/bin/start-all.sh
From: [email protected]
[mailto:[email protected]] On
Behalf Of Sean Busbey
Sent: Thursday, January 16, 2014 2:20 PM
To: Accumulo User List
Subject: Re: accumulo startup issue: Accumulo not initialized, there is no
instance id at /accumulo/instance_id
On Thu, Jan 16, 2014 at 1:14 PM, Kesten Broughton <[email protected]> wrote:
"You should make sure to correct the maximum number of open files for the user
that is running Accumulo."
I have the following in all /etc/security/limits.conf in my accumulo cluster
hdfs soft nofile 65536
hdfs hard nofile 65536
However, i see this for all nodes.
WARN : Max files open on 10.0.11.208 is 32768, recommend 65536
Should it be a different user or something?
'the user that is running Accumulo'
sudo hdfs
hdfs$ bin/accumulo -u root
so is hdfs or root the accumulo user?
The user in question here is the one who starts the Accumulo server processes.
In production environments this should be a user dedicated to Accumulo. FWIW, I
usually name this user "accumulo".
How do you start up Accumulo? a service script? running
$ACCUMULO_HOME/bin/start-all.sh? something else?