Hi Josh, Thanks! Sounds like Azure's offerings are providing better performance and sync()'ing over S3? (I.e., is S3 still a no-go for Accumulo?)
Your description of WebHDFS makes totally sense. I figured there may be an outside chance that WebHDFS handled or worked around limitations from S3, etc. Cheers, Jim On Fri, Apr 14, 2017 at 12:47 PM, Josh Elser <josh.el...@gmail.com> wrote: > Hi Jim, > > I can say that Accumulo will work on Azure's blob store and their data > lake store. These are a result of testing I'm involved with at > Hortonworks (dayjob). I know that these filesystems are tested to an > appropriate degree, proving that they do provide the things that > Accumulo needs. > > As a refresher, the things we need from a filesystem are: performance > (Accumulo's write performance is pretty dominated by I/O) and > durability guarantees (when we call sync() on a file, the data we just > wrote better be there). > > For WebHDFS, I think you would both hurt for performance and I would > be surprised if it actually provided the durability correctness. My > understanding is that WebHDFS is more meant to allow non-Java clients > easy access to HDFS (as a one-off) rather than act as a fully-fledged > access layer. > > - Josh > > On Fri, Apr 14, 2017 at 10:16 AM, James Hughes <jn...@virginia.edu> wrote: > > Hi all, > > > > I know folks have asked about Accumulo on S3 before (1). > > > > Has anyone tried running Accumulo on Azure's blob storage or data lake > > solutions (2)? (Or perhaps more generally, has anyone tried Accumulo on > > WebHDFS?) > > > > As more background, I have deployed Accumulo on HDP clouds in Azure, and > > that works great. I'm interested in using the blob / data lake storage > for > > benefits with scaling, etc. > > > > Thanks in advance, > > > > Jim > > > > 1. http://apache-accumulo.1065345.n5.nabble.com/ > Accumulo-on-s3-td16737.html > > 2. > > https://docs.microsoft.com/en-us/azure/data-lake-store/data- > lake-store-integrate-with-other-services >