Is HDFS actually healthy? Have you checked the namenode status page
(http://$hostname:50070 by default) to make sure the NN is up and out of
safemode, expected number of DNs have reported in, hdfs reports
available space, etc?
Any other Hadoop details (version, etc) would be helpful too!
Mike Atlas wrote:
Well, I caught the same error again after terminating my machine with a
hard stop - which isn't a normal way to do things but I fat-finger saved
an AMI image of it thinking I could boot up just fine afterward.
The only workaround I could do to resolve it was to blow away the HDFS
/accumulo directory and re-init my accumulo instance again --- which is
fine for playing around, but I'm wondering what exactly is going on? I
don't want that to happen if I went to production and had real data.
Thoughts on how to debug?
On Tue, Jan 6, 2015 at 10:40 AM, Keith Turner <[email protected]
<mailto:[email protected]>> wrote:
On Mon, Jan 5, 2015 at 6:50 PM, Mike Atlas <[email protected]
<mailto:[email protected]>> wrote:
Hello,
I'm running Accumulo 1.5.2, trying to test out the GeoMesa
<http://www.geomesa.org/2014/05/28/geomesa-quickstart/> family
of spatio-temporal iterators using their quickstart
demonstration tool. I think I'm not making progress due to my
Accumulo setup, though, so can someone validate that all looks
good from here?
start-all.sh output:
hduser@accumulo:~$ $ACCUMULO_HOME/bin/start-all.sh
Starting monitor on localhost
Starting tablet servers .... done
Starting tablet server on localhost
2015-01-05 21:37:18,523 [server.Accumulo] INFO : Attempting to talk to
zookeeper
2015-01-05 21:37:18,772 [server.Accumulo] INFO : Zookeeper connected
and initialized, attemping to talk to HDFS
2015-01-05 21:37:19,028 [server.Accumulo] INFO : Connected to HDFS
Starting master on localhost
Starting garbage collector on localhost
Starting tracer on localhost
hduser@accumulo:~$
I do believe my HDFS is set up correctly:
hduser@accumulo:/home/ubuntu/geomesa-quickstart$ hadoop fs -ls /accumulo
Found 5 items
drwxrwxrwx - hduser supergroup 0 2014-12-10 01:04
/accumulo/instance_id
drwxrwxrwx - hduser supergroup 0 2015-01-05 21:22
/accumulo/recovery
drwxrwxrwx - hduser supergroup 0 2015-01-05 20:14
/accumulo/tables
drwxrwxrwx - hduser supergroup 0 2014-12-10 01:04
/accumulo/version
drwxrwxrwx - hduser supergroup 0 2014-12-10 01:05
/accumulo/wal
However, when I check the Accumulo monitor logs, I see these
errors post-startup:
java.io.IOException: Mkdirs failed to create directory
/accumulo/recovery/15664488-bd10-4d8d-9584-f88d8595a07c/part-r-00000
java.io.IOException: Mkdirs failed to create directory
/accumulo/recovery/15664488-bd10-4d8d-9584-f88d8595a07c/part-r-00000
at
org.apache.hadoop.io.MapFile$Writer.<init>(MapFile.java:264)
at
org.apache.hadoop.io.MapFile$Writer.<init>(MapFile.java:103)
at
org.apache.accumulo.server.tabletserver.log.LogSorter$LogProcessor.writeBuffer(LogSorter.java:196)
at
org.apache.accumulo.server.tabletserver.log.LogSorter$LogProcessor.sort(LogSorter.java:166)
at
org.apache.accumulo.server.tabletserver.log.LogSorter$LogProcessor.process(LogSorter.java:89)
at
org.apache.accumulo.server.zookeeper.DistributedWorkQueue$1.run(DistributedWorkQueue.java:101)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at
org.apache.accumulo.trace.instrument.TraceRunnable.run(TraceRunnable.java:47)
at
org.apache.accumulo.core.util.LoggingRunnable.run(LoggingRunnable.java:34)
at java.lang.Thread.run(Thread.java:745)
I don't really understand - I started accumulo as the hduser,
which is the same user that has access to the HDFS directory
/accumulo/recovery, and it looks like the directory was created
actually, except for the last directory (part-r-0000):
hduser@accumulo:~$ hadoop fs -ls /accumulo0/recovery/
Found 1 items
drwxr-xr-x - hduser supergroup 0 2015-01-05 22:11
/accumulo/recovery/87fb7aac-0274-4aea-8014-9d53dbbdfbbc
I'm not out of physical disk space:
hduser@accumulo:~$ df -h
Filesystem Size Used Avail Use% Mounted on
/dev/xvda1 1008G 8.5G 959G 1% /
What could be going on here? Any ideas on something simple I
could have missed?
One possibility is that tserver where the exception occurred had bad
or missing config for hdfs. In this case the hadoop code may try to
create /accumulo/recovery/.../part-r-00000 in local fs, which would
fail.
Thanks,
Mike