Re: Accumulo on S3

Kevin Hobbs Thu, 02 Apr 2020 12:57:40 -0700

Accumulo Users,

Is AWS EMR's "EMRFS consistent view" useful or required for Accumulo2 onS3? Has anyone else tried EMR + Accumulo2 on S3?


I have incorporated *most* of the steps in the blog post

https://accumulo.apache.org/blog/2019/09/10/accumulo-S3-notes.html

into an AWS EMR bootstrap action, that creates an Accumulo clusterrunning on emr-6.0.0-beta2. I have not used the hadoop-aws-relocated jaras the emr jars are available.

I am able to use a GeoMesa snapshot to ingest and retrieve data on thes3 volume. However, I just tried an ingest of about 10GB whichprogressed smoothly for a while until the masters web UI reported "MajCFailed, extent = a<;":

java.io.IOException: Renames3://THEBUCKET/accumulo/tables/a/default_tablet/A00000ci.rf_tmp tos3://THEBUCKET/accumulo/tables/a/default_tablet/A00000ci.rf returned falseatorg.apache.accumulo.tserver.tablet.DatafileManager.rename(DatafileManager.java:85)atorg.apache.accumulo.tserver.tablet.DatafileManager.bringMajorCompactionOnline(DatafileManager.java:533)atorg.apache.accumulo.tserver.tablet.Tablet._majorCompact(Tablet.java:2051)

        at 
org.apache.accumulo.tserver.tablet.Tablet.majorCompact(Tablet.java:2164)

atorg.apache.accumulo.tserver.tablet.CompactionRunner.run(CompactionRunner.java:37)

        at org.apache.htrace.wrappers.TraceRunnable.run(TraceRunnable.java:57)

atjava.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)atjava.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)atorg.apache.accumulo.fate.util.LoggingRunnable.run(LoggingRunnable.java:35)

        at java.lang.Thread.run(Thread.java:748)


A bit later it reported:

java.io.FileNotFoundException: No such file or directory's3://THEBUCKET/accumulo/tables/c/t-0000090/F00000nz.rf'atcom.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem.getFileStatus(S3NativeFileSystem.java:808)atcom.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem.open(S3NativeFileSystem.java:1212)

        at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:902)
        at 
com.amazon.ws.emr.hadoop.fs.EmrFileSystem.open(EmrFileSystem.java:207)

atorg.apache.accumulo.core.file.blockfile.impl.CachableBlockFile$CachableBuilder.lambda$fsPath$0(CachableBlockFile.java:91)atorg.apache.accumulo.core.file.blockfile.impl.CachableBlockFile$Reader.getBCFile(CachableBlockFile.java:172)atorg.apache.accumulo.core.file.blockfile.impl.CachableBlockFile$Reader.getMetaBlock(CachableBlockFile.java:400)

        at 
org.apache.accumulo.core.file.rfile.RFile$Reader.<init>(RFile.java:1156)
        at 
org.apache.accumulo.core.file.rfile.RFile$Reader.<init>(RFile.java:1251)

atorg.apache.accumulo.core.file.rfile.RFileOperations.getReader(RFileOperations.java:53)atorg.apache.accumulo.core.file.rfile.RFileOperations.openReader(RFileOperations.java:68)atorg.apache.accumulo.core.file.DispatchingFileFactory.openReader(DispatchingFileFactory.java:83)atorg.apache.accumulo.core.file.FileOperations$ReaderBuilder.build(FileOperations.java:478)atorg.apache.accumulo.tserver.tablet.Compactor.openMapDataFiles(Compactor.java:299)atorg.apache.accumulo.tserver.tablet.Compactor.compactLocalityGroup(Compactor.java:344)

        at org.apache.accumulo.tserver.tablet.Compactor.call(Compactor.java:225)

atorg.apache.accumulo.tserver.tablet.Tablet._majorCompact(Tablet.java:2039)

        at 
org.apache.accumulo.tserver.tablet.Tablet.majorCompact(Tablet.java:2164)

atorg.apache.accumulo.tserver.tablet.CompactionRunner.run(CompactionRunner.java:37)

        at org.apache.htrace.wrappers.TraceRunnable.run(TraceRunnable.java:57)

        at java.lang.Thread.run(Thread.java:748)

These seem like the same sort of problems HBASE on EMR can have whenEMRFS isn't functioning properly.


--Kevin

On 3/3/20 1:57 PM, Jim Hughes wrote:

Hi all,
The next major release of GeoMesa is aimed at supporting Accumulo 2.x.As part of testing, my coworker Kevin and I are trying out Accumulo 2.0on S3.
Keith's blog post[1] is great. As people have tested Accumulo 2.0 inAWS, has anyone tried using EMR for the underlying HDFS cluster (andthen installing Accumulo via bootstrap actions)? Is there apreferred/suggested deployment strategy?
Cheers,

Jim

1. https://accumulo.apache.org/blog/2019/09/10/accumulo-S3-notes.html

Re: Accumulo on S3

Reply via email to