Accumulo Users,

Is AWS EMR's "EMRFS consistent view" useful or required for Accumulo2 on S3? Has anyone else tried EMR + Accumulo2 on S3?

I have incorporated *most* of the steps in the blog post

https://accumulo.apache.org/blog/2019/09/10/accumulo-S3-notes.html

into an AWS EMR bootstrap action, that creates an Accumulo cluster running on emr-6.0.0-beta2. I have not used the hadoop-aws-relocated jar as the emr jars are available.

I am able to use a GeoMesa snapshot to ingest and retrieve data on the s3 volume. However, I just tried an ingest of about 10GB which progressed smoothly for a while until the masters web UI reported "MajC Failed, extent = a<;":

java.io.IOException: Rename s3://THEBUCKET/accumulo/tables/a/default_tablet/A00000ci.rf_tmp to s3://THEBUCKET/accumulo/tables/a/default_tablet/A00000ci.rf returned false at org.apache.accumulo.tserver.tablet.DatafileManager.rename(DatafileManager.java:85) at org.apache.accumulo.tserver.tablet.DatafileManager.bringMajorCompactionOnline(DatafileManager.java:533) at org.apache.accumulo.tserver.tablet.Tablet._majorCompact(Tablet.java:2051)
        at 
org.apache.accumulo.tserver.tablet.Tablet.majorCompact(Tablet.java:2164)
at org.apache.accumulo.tserver.tablet.CompactionRunner.run(CompactionRunner.java:37)
        at org.apache.htrace.wrappers.TraceRunnable.run(TraceRunnable.java:57)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at org.apache.accumulo.fate.util.LoggingRunnable.run(LoggingRunnable.java:35)
        at java.lang.Thread.run(Thread.java:748)


A bit later it reported:

java.io.FileNotFoundException: No such file or directory 's3://THEBUCKET/accumulo/tables/c/t-0000090/F00000nz.rf' at com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem.getFileStatus(S3NativeFileSystem.java:808) at com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem.open(S3NativeFileSystem.java:1212)
        at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:902)
        at 
com.amazon.ws.emr.hadoop.fs.EmrFileSystem.open(EmrFileSystem.java:207)
at org.apache.accumulo.core.file.blockfile.impl.CachableBlockFile$CachableBuilder.lambda$fsPath$0(CachableBlockFile.java:91) at org.apache.accumulo.core.file.blockfile.impl.CachableBlockFile$Reader.getBCFile(CachableBlockFile.java:172) at org.apache.accumulo.core.file.blockfile.impl.CachableBlockFile$Reader.getMetaBlock(CachableBlockFile.java:400)
        at 
org.apache.accumulo.core.file.rfile.RFile$Reader.<init>(RFile.java:1156)
        at 
org.apache.accumulo.core.file.rfile.RFile$Reader.<init>(RFile.java:1251)
at org.apache.accumulo.core.file.rfile.RFileOperations.getReader(RFileOperations.java:53) at org.apache.accumulo.core.file.rfile.RFileOperations.openReader(RFileOperations.java:68) at org.apache.accumulo.core.file.DispatchingFileFactory.openReader(DispatchingFileFactory.java:83) at org.apache.accumulo.core.file.FileOperations$ReaderBuilder.build(FileOperations.java:478) at org.apache.accumulo.tserver.tablet.Compactor.openMapDataFiles(Compactor.java:299) at org.apache.accumulo.tserver.tablet.Compactor.compactLocalityGroup(Compactor.java:344)
        at org.apache.accumulo.tserver.tablet.Compactor.call(Compactor.java:225)
at org.apache.accumulo.tserver.tablet.Tablet._majorCompact(Tablet.java:2039)
        at 
org.apache.accumulo.tserver.tablet.Tablet.majorCompact(Tablet.java:2164)
at org.apache.accumulo.tserver.tablet.CompactionRunner.run(CompactionRunner.java:37)
        at org.apache.htrace.wrappers.TraceRunnable.run(TraceRunnable.java:57)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at org.apache.accumulo.fate.util.LoggingRunnable.run(LoggingRunnable.java:35)
        at java.lang.Thread.run(Thread.java:748)


These seem like the same sort of problems HBASE on EMR can have when EMRFS isn't functioning properly.

--Kevin

On 3/3/20 1:57 PM, Jim Hughes wrote:
Hi all,

The next major release of GeoMesa is aimed at supporting Accumulo 2.x. As part of testing, my coworker Kevin and I are trying out Accumulo 2.0 on S3.

Keith's blog post[1] is great.  As people have tested Accumulo 2.0 in AWS, has anyone tried using EMR for the underlying HDFS cluster (and then installing Accumulo via bootstrap actions)?  Is there a preferred/suggested deployment strategy?

Cheers,

Jim

1. https://accumulo.apache.org/blog/2019/09/10/accumulo-S3-notes.html

Reply via email to