Hi Josh, Thanks again!
As a follow-up, is any of the information about Accumulo on WASB or ADL public? I suppose I'm curious about configuration (is it just plug-and-play?) and performance. Thanks in advance, Jim On Sat, Apr 15, 2017 at 2:25 PM, Josh Elser <josh.el...@gmail.com> wrote: > As I understand it, S3 is currently still a non-starter. > > Long term, Amazon may provide some more features to fix the sync issue. > Or, someone can modify Accumulo to support putting rfiles on s3 exclusively. > > Happy to expand on this further if you're curious. > > > On Apr 14, 2017 15:16, "James Hughes" <jn...@virginia.edu> wrote: > > Hi Josh, > > Thanks! Sounds like Azure's offerings are providing better performance > and sync()'ing over S3? (I.e., is S3 still a no-go for Accumulo?) > > Your description of WebHDFS makes totally sense. I figured there may be > an outside chance that WebHDFS handled or worked around limitations from > S3, etc. > > Cheers, > > Jim > > On Fri, Apr 14, 2017 at 12:47 PM, Josh Elser <josh.el...@gmail.com> wrote: > >> Hi Jim, >> >> I can say that Accumulo will work on Azure's blob store and their data >> lake store. These are a result of testing I'm involved with at >> Hortonworks (dayjob). I know that these filesystems are tested to an >> appropriate degree, proving that they do provide the things that >> Accumulo needs. >> >> As a refresher, the things we need from a filesystem are: performance >> (Accumulo's write performance is pretty dominated by I/O) and >> durability guarantees (when we call sync() on a file, the data we just >> wrote better be there). >> >> For WebHDFS, I think you would both hurt for performance and I would >> be surprised if it actually provided the durability correctness. My >> understanding is that WebHDFS is more meant to allow non-Java clients >> easy access to HDFS (as a one-off) rather than act as a fully-fledged >> access layer. >> >> - Josh >> >> On Fri, Apr 14, 2017 at 10:16 AM, James Hughes <jn...@virginia.edu> >> wrote: >> > Hi all, >> > >> > I know folks have asked about Accumulo on S3 before (1). >> > >> > Has anyone tried running Accumulo on Azure's blob storage or data lake >> > solutions (2)? (Or perhaps more generally, has anyone tried Accumulo on >> > WebHDFS?) >> > >> > As more background, I have deployed Accumulo on HDP clouds in Azure, and >> > that works great. I'm interested in using the blob / data lake storage >> for >> > benefits with scaling, etc. >> > >> > Thanks in advance, >> > >> > Jim >> > >> > 1. http://apache-accumulo.1065345.n5.nabble.com/Accumulo-on-s3- >> td16737.html >> > 2. >> > https://docs.microsoft.com/en-us/azure/data-lake-store/data- >> lake-store-integrate-with-other-services >> > > >