Disclaimer, 

 

Not advocating this is the best approach, just what I’m currently doing, put 
this together pretty quick, but it should be mostly complete for settting up 
accumulo on cdh hdfs/zk

 

 

 

I always do something like this first on CentOS:

 

$ yum install –y ntpd openssh-clients unzip

 

#setup ssh and ntpd as needed

 

$install the jdk RPM

 

# bash this to setup OS specifics

 

echo "Disabling SELINUX for Optimal CDH Compatability..."

sed -i 's/SELINUX=enforcing/SELINUX=disabled/g' /etc/selinux/config

 

echo "Increasing uLimit, aka File Descripter/Handlers for all Users..."

echo "# Adding Support for CDH" >> /etc/security/limits.conf

echo "*              -     nofile              65536" >> 
/etc/security/limits.conf

 

echo "Disabling IPv6..."

echo "# Disable ipv6" >> /etc/sysctl.conf

echo "net.ipv6.conf.all.disable_ipv6 = 1" >> /etc/sysctl.conf

echo "net.ipv6.conf.default.disable_ipv6 = 1" >> /etc/sysctl.conf

 

echo "Increasing Swapiness Factor to limit use of swap space."

echo "# swappiness for accumulo" >> /etc/sysctl.conf

echo "vm.swappiness = 10" >> /etc/sysctl.conf

 

 

reboot and test OS/services/jdk version…

 

then I usually extract Accumulo to /opt/Accumulo/accumulo-1.5.0

 

Make a sym link, /opt/accumulo/Accumulo-current -> ./accumulo-1.5.0

 

#make dirs. For Accumulo logs, where ever…

mkdir /var/log/accumulo

 

#let HDFS own all your Accumulo folders

chown –R hdfs:hdfs /opt/accumulo

chown –R hdfs:hdfs /var/log/accumulo

 

#update the hdfs password for the next step

user root: passwd hdfs

 

#setup passwordless ssh (test using hdfs afterwards, should be able to ssh 
<node> w/o entering credentials)

su –hdfs

ssh-copy-id <for all tablet server nodes>

 

#update your iptables

 

 

#env vars
ACCUMULO_HOME=/opt/accumulo/accumulo-1.5.0

JAVA_HOME=/usr/java/default (jdk7 in my last install worked fine)

 

Settings for accumulo-env.sh in /conf:

# cdh4

export HADOOP_HDFS_HOME=/opt/cloudera/parcels/CDH/lib/hadoop-hdfs

export HADOOP_MAPREDUCE_HOME=/opt/cloudera/parcels/CDH/lib/hadoop-mapreduce

test -z "$HADOOP_CONF_DIR"       && export 
HADOOP_CONF_DIR="$HADOOP_PREFIX/etc/hadoop"

test -z "$JAVA_HOME"             && export JAVA_HOME=/usr/java/default

test -z "$ZOOKEEPER_HOME"        && export 
ZOOKEEPER_HOME=/opt/cloudera/parcels/CDH/lib/zookeeper

test -z "$ACCUMULO_LOG_DIR"      && export ACCUMULO_LOG_DIR=$ACCUMULO_HOME/logs

 

#update all files as appropriate in /opt/Accumulo/Accumulo-current/conf/*

masters, monitor,slaves,tracers,gc,Accumulo-site.xml, Accumulo-env.sh

 

#accumulo-site.xml

<property>

    <name>general.classpaths</name>

    <value>

      $ACCUMULO_HOME/server/target/classes/,

      $ACCUMULO_HOME/lib/accumulo-server.jar,

      $ACCUMULO_HOME/core/target/classes/,

      $ACCUMULO_HOME/lib/accumulo-core.jar,

      $ACCUMULO_HOME/start/target/classes/,

      $ACCUMULO_HOME/lib/accumulo-start.jar,

      $ACCUMULO_HOME/fate/target/classes/,

      $ACCUMULO_HOME/lib/accumulo-fate.jar,

      $ACCUMULO_HOME/proxy/target/classes/,

      $ACCUMULO_HOME/lib/accumulo-proxy.jar,

      $ACCUMULO_HOME/lib/[^.].*.jar,

      $ZOOKEEPER_HOME/zookeeper[^.].*.jar,

      $HADOOP_CONF_DIR,

      $HADOOP_PREFIX/[^.].*.jar,

      $HADOOP_PREFIX/lib/[^.].*.jar,

      $HADOOP_HDFS_HOME/.*.jar,

      $HADOOP_HDFS_HOME/lib/.*.jar,

      $HADOOP_MAPREDUCE_HOME/.*.jar,

      $HADOOP_MAPREDUCE_HOME/lib/.*.jar

    </value>

    <description>Classpaths that accumulo checks for updates and class files.

      When using the Security Manager, please remove the ".../target/classes/" 
values.

    </description>

  </property>

 

then of course, always run your Accumulo binaries/scripts using the HDFS 
account.  I’m sure I’m missing a few steps here and there… 

 

$ACCUMULO_HOME/bin/accumulo init

…

$ACCUMULO_HOME/bin/start-all.sh

 

 

 

From: [email protected] 
[mailto:[email protected]] On 
Behalf Of Sean Busbey
Sent: Thursday, January 16, 2014 2:20 PM
To: Accumulo User List
Subject: Re: accumulo startup issue: Accumulo not initialized, there is no 
instance id at /accumulo/instance_id

 

 

On Thu, Jan 16, 2014 at 1:14 PM, Kesten Broughton <[email protected]> wrote:

"You should make sure to correct the maximum number of open files for the user 
that is running Accumulo."

I have the following in all /etc/security/limits.conf in my accumulo cluster

hdfs soft nofile 65536
hdfs hard nofile 65536

However, i see this for all nodes.
WARN : Max files open on 10.0.11.208 is 32768, recommend 65536

Should it be a different user or something?

'the user that is running Accumulo'
sudo hdfs
hdfs$ bin/accumulo -u root

so is hdfs or root the accumulo user?

 

The user in question here is the one who starts the Accumulo server processes. 
In production environments this should be a user dedicated to Accumulo. FWIW, I 
usually name this user "accumulo".

 

How do you start up Accumulo? a service script? running 
$ACCUMULO_HOME/bin/start-all.sh? something else?

Reply via email to