Re: Accumulo on S3

Josh Elser Fri, 03 Apr 2020 09:11:20 -0700

It sounds like you're running into the known S3 consistency issues.However, I don't know what exactly EMRFS is supposed to support all ofthe things that Accumulo requires. I would assume that EMRFS should bebridging the gap from S3 (a blobstore) to a consistent, distributedFileSystem that Accumulo provides. Their summary[1] indicates thatconsistent listings and read-after-write is solve which is a bigproblem. Not sure if you are supposed to also get atomic rename from it.

This presentation[2] should be a good primer I put together earlier thisyear on cloud storage for BigTables which may help you understand what'sgoing on. I gave it at a meetup here in MD a couple of months back, butI don't think we were recording it.


[1] https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-fs.html
[2]
https://drive.google.com/file/d/1Or1s-X0JjiLM87HKIOWlh3WlkdUQfYH9/view?usp=sharing

On 4/2/20 3:56 PM, Kevin Hobbs wrote:

Accumulo Users,
Is AWS EMR's "EMRFS consistent view" useful or required for Accumulo2 onS3? Has anyone else tried EMR + Accumulo2 on S3?
I have incorporated *most* of the steps in the blog post

https://accumulo.apache.org/blog/2019/09/10/accumulo-S3-notes.html
into an AWS EMR bootstrap action, that creates an Accumulo clusterrunning on emr-6.0.0-beta2. I have not used the hadoop-aws-relocated jaras the emr jars are available.
I am able to use a GeoMesa snapshot to ingest and retrieve data on thes3 volume. However, I just tried an ingest of about 10GB whichprogressed smoothly for a while until the masters web UI reported "MajCFailed, extent = a<;":
java.io.IOException: Renames3://THEBUCKET/accumulo/tables/a/default_tablet/A00000ci.rf_tmp tos3://THEBUCKET/accumulo/tables/a/default_tablet/A00000ci.rf returned false atorg.apache.accumulo.tserver.tablet.DatafileManager.rename(DatafileManager.java:85) atorg.apache.accumulo.tserver.tablet.DatafileManager.bringMajorCompactionOnline(DatafileManager.java:533) atorg.apache.accumulo.tserver.tablet.Tablet._majorCompact(Tablet.java:2051) atorg.apache.accumulo.tserver.tablet.Tablet.majorCompact(Tablet.java:2164) atorg.apache.accumulo.tserver.tablet.CompactionRunner.run(CompactionRunner.java:37)
     at org.apache.htrace.wrappers.TraceRunnable.run(TraceRunnable.java:57)
atjava.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) atjava.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) atorg.apache.accumulo.fate.util.LoggingRunnable.run(LoggingRunnable.java:35)
     at java.lang.Thread.run(Thread.java:748)


A bit later it reported:
java.io.FileNotFoundException: No such file or directory's3://THEBUCKET/accumulo/tables/c/t-0000090/F00000nz.rf' atcom.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem.getFileStatus(S3NativeFileSystem.java:808) atcom.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem.open(S3NativeFileSystem.java:1212)
     at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:902)
atcom.amazon.ws.emr.hadoop.fs.EmrFileSystem.open(EmrFileSystem.java:207) atorg.apache.accumulo.core.file.blockfile.impl.CachableBlockFile$CachableBuilder.lambda$fsPath$0(CachableBlockFile.java:91) atorg.apache.accumulo.core.file.blockfile.impl.CachableBlockFile$Reader.getBCFile(CachableBlockFile.java:172) atorg.apache.accumulo.core.file.blockfile.impl.CachableBlockFile$Reader.getMetaBlock(CachableBlockFile.java:400) atorg.apache.accumulo.core.file.rfile.RFile$Reader.<init>(RFile.java:1156) atorg.apache.accumulo.core.file.rfile.RFile$Reader.<init>(RFile.java:1251) atorg.apache.accumulo.core.file.rfile.RFileOperations.getReader(RFileOperations.java:53) atorg.apache.accumulo.core.file.rfile.RFileOperations.openReader(RFileOperations.java:68) atorg.apache.accumulo.core.file.DispatchingFileFactory.openReader(DispatchingFileFactory.java:83) atorg.apache.accumulo.core.file.FileOperations$ReaderBuilder.build(FileOperations.java:478) atorg.apache.accumulo.tserver.tablet.Compactor.openMapDataFiles(Compactor.java:299) atorg.apache.accumulo.tserver.tablet.Compactor.compactLocalityGroup(Compactor.java:344) atorg.apache.accumulo.tserver.tablet.Compactor.call(Compactor.java:225) atorg.apache.accumulo.tserver.tablet.Tablet._majorCompact(Tablet.java:2039) atorg.apache.accumulo.tserver.tablet.Tablet.majorCompact(Tablet.java:2164) atorg.apache.accumulo.tserver.tablet.CompactionRunner.run(CompactionRunner.java:37)
     at org.apache.htrace.wrappers.TraceRunnable.run(TraceRunnable.java:57)
atjava.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) atjava.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) atorg.apache.accumulo.fate.util.LoggingRunnable.run(LoggingRunnable.java:35)
     at java.lang.Thread.run(Thread.java:748)
These seem like the same sort of problems HBASE on EMR can have whenEMRFS isn't functioning properly.
--Kevin

On 3/3/20 1:57 PM, Jim Hughes wrote:
Hi all,
The next major release of GeoMesa is aimed at supporting Accumulo 2.x.As part of testing, my coworker Kevin and I are trying out Accumulo2.0 on S3.
Keith's blog post[1] is great. As people have tested Accumulo 2.0 inAWS, has anyone tried using EMR for the underlying HDFS cluster (andthen installing Accumulo via bootstrap actions)? Is there apreferred/suggested deployment strategy?
Cheers,

Jim

1. https://accumulo.apache.org/blog/2019/09/10/accumulo-S3-notes.html

Re: Accumulo on S3

Reply via email to