Hi Josh,

Thanks again!

As a follow-up, is any of the information about Accumulo on WASB or ADL
public?  I suppose I'm curious about configuration (is it just
plug-and-play?) and performance.

Thanks in advance,

Jim

On Sat, Apr 15, 2017 at 2:25 PM, Josh Elser <josh.el...@gmail.com> wrote:

> As I understand it, S3 is currently still a non-starter.
>
> Long term, Amazon may provide some more features to fix the sync issue.
> Or, someone can modify Accumulo to support putting rfiles on s3 exclusively.
>
> Happy to expand on this further if you're curious.
>
>
> On Apr 14, 2017 15:16, "James Hughes" <jn...@virginia.edu> wrote:
>
> Hi Josh,
>
> Thanks!  Sounds like Azure's offerings are providing better performance
> and sync()'ing over S3?  (I.e., is S3 still a no-go for Accumulo?)
>
> Your description of WebHDFS makes totally sense.  I figured there may be
> an outside chance that WebHDFS handled or worked around limitations from
> S3, etc.
>
> Cheers,
>
> Jim
>
> On Fri, Apr 14, 2017 at 12:47 PM, Josh Elser <josh.el...@gmail.com> wrote:
>
>> Hi Jim,
>>
>> I can say that Accumulo will work on Azure's blob store and their data
>> lake store. These are a result of testing I'm involved with at
>> Hortonworks (dayjob). I know that these filesystems are tested to an
>> appropriate degree, proving that they do provide the things that
>> Accumulo needs.
>>
>> As a refresher, the things we need from a filesystem are: performance
>> (Accumulo's write performance is pretty dominated by I/O) and
>> durability guarantees (when we call sync() on a file, the data we just
>> wrote better be there).
>>
>> For WebHDFS, I think you would both hurt for performance and I would
>> be surprised if it actually provided the durability correctness. My
>> understanding is that WebHDFS is more meant to allow non-Java clients
>> easy access to HDFS (as a one-off) rather than act as a fully-fledged
>> access layer.
>>
>> - Josh
>>
>> On Fri, Apr 14, 2017 at 10:16 AM, James Hughes <jn...@virginia.edu>
>> wrote:
>> > Hi all,
>> >
>> > I know folks have asked about Accumulo on S3 before (1).
>> >
>> > Has anyone tried running Accumulo on Azure's blob storage or data lake
>> > solutions (2)?  (Or perhaps more generally, has anyone tried Accumulo on
>> > WebHDFS?)
>> >
>> > As more background, I have deployed Accumulo on HDP clouds in Azure, and
>> > that works great.  I'm interested in using the blob / data lake storage
>> for
>> > benefits with scaling, etc.
>> >
>> > Thanks in advance,
>> >
>> > Jim
>> >
>> > 1.  http://apache-accumulo.1065345.n5.nabble.com/Accumulo-on-s3-
>> td16737.html
>> > 2.
>> > https://docs.microsoft.com/en-us/azure/data-lake-store/data-
>> lake-store-integrate-with-other-services
>>
>
>
>

Reply via email to