I don't have any performance numbers handy. I'm not sure if
Microsoft/Azure-team publishes them.
In general, my understanding is that each of them are intended to be
"drop-in replacements". There might be some implementation specific
configuration (e.g. account/billing), but that's it.
James Hughes wrote:
Hi Josh,
Thanks again!
As a follow-up, is any of the information about Accumulo on WASB or ADL
public? I suppose I'm curious about configuration (is it just
plug-and-play?) and performance.
Thanks in advance,
Jim
On Sat, Apr 15, 2017 at 2:25 PM, Josh Elser <josh.el...@gmail.com
<mailto:josh.el...@gmail.com>> wrote:
As I understand it, S3 is currently still a non-starter.
Long term, Amazon may provide some more features to fix the sync
issue. Or, someone can modify Accumulo to support putting rfiles on
s3 exclusively.
Happy to expand on this further if you're curious.
On Apr 14, 2017 15:16, "James Hughes" <jn...@virginia.edu
<mailto:jn...@virginia.edu>> wrote:
Hi Josh,
Thanks! Sounds like Azure's offerings are providing better
performance and sync()'ing over S3? (I.e., is S3 still a no-go
for Accumulo?)
Your description of WebHDFS makes totally sense. I figured
there may be an outside chance that WebHDFS handled or worked
around limitations from S3, etc.
Cheers,
Jim
On Fri, Apr 14, 2017 at 12:47 PM, Josh Elser
<josh.el...@gmail.com <mailto:josh.el...@gmail.com>> wrote:
Hi Jim,
I can say that Accumulo will work on Azure's blob store and
their data
lake store. These are a result of testing I'm involved with at
Hortonworks (dayjob). I know that these filesystems are
tested to an
appropriate degree, proving that they do provide the things that
Accumulo needs.
As a refresher, the things we need from a filesystem are:
performance
(Accumulo's write performance is pretty dominated by I/O) and
durability guarantees (when we call sync() on a file, the
data we just
wrote better be there).
For WebHDFS, I think you would both hurt for performance and
I would
be surprised if it actually provided the durability
correctness. My
understanding is that WebHDFS is more meant to allow
non-Java clients
easy access to HDFS (as a one-off) rather than act as a
fully-fledged
access layer.
- Josh
On Fri, Apr 14, 2017 at 10:16 AM, James Hughes
<jn...@virginia.edu <mailto:jn...@virginia.edu>> wrote:
> Hi all,
>
> I know folks have asked about Accumulo on S3 before (1).
>
> Has anyone tried running Accumulo on Azure's blob storage
or data lake
> solutions (2)? (Or perhaps more generally, has anyone
tried Accumulo on
> WebHDFS?)
>
> As more background, I have deployed Accumulo on HDP
clouds in Azure, and
> that works great. I'm interested in using the blob /
data lake storage for
> benefits with scaling, etc.
>
> Thanks in advance,
>
> Jim
>
> 1.
http://apache-accumulo.1065345.n5.nabble.com/Accumulo-on-s3-td16737.html
<http://apache-accumulo.1065345.n5.nabble.com/Accumulo-on-s3-td16737.html>
> 2.
>
https://docs.microsoft.com/en-us/azure/data-lake-store/data-lake-store-integrate-with-other-services
<https://docs.microsoft.com/en-us/azure/data-lake-store/data-lake-store-integrate-with-other-services>