subject:"Slack call notes"

Re: FileSystem API (was: Slack call notes)

2020-04-29 Thread Michael Wall

 1734
> > > > > > > Without tests it is slightly more feasible...
> > > > > > > grep -r "import org.apache.hadoop" --include=*.java * | grep -v
> > > "test"
> > > > > |
> > > > > > wc
> > > > > > > -l
> > > > > > > 858
> > > > > > >
> > > > > > >
> > > > > > > On Wed, Mar 25, 2020 at 3:19 PM David Mollitor <
> dam6...@gmail.com>
> > > > > > wrote:
> > > > > > >
> > > > > > > > Hello,
> > > > > > > >
> > > > > > > > I too have been thinking about this for a pet project.
> There is
> > > > > > already
> > > > > > > > Apache Commons VFS that, with some investment, could probably
> > > serve
> > > > > all
> > > > > > > > these requirements.
> > > > > > > >
> > > > > > > > On Wed, Mar 25, 2020, 3:16 PM Christopher <
> ctubb...@apache.org>
> > > > > wrote:
> > > > > > > >
> > > > > > > > > (Forking this thread, as it's a distinct topic)
> > > > > > > > >
> > > > > > > > > I've thought about it. The idea has driven me to try to
> reduce
> > > our
> > > > > > use
> > > > > > > > > of Hadoop-specific code, and to isolate Hadoop-specific
> stuff
> > > > > behind
> > > > > > > > > some abstraction, wherever possible. Though, I'll admit,
> we're
> > > > > > nowhere
> > > > > > > > > close to where we'd want to be to be fully decoupled from
> > > Hadoop.
> > > > > > > > >
> > > > > > > > > I've also been looking a lot at our VolumeManager code
> lately,
> > > to
> > > > > try
> > > > > > > > > to improve it a bit, and to create better abstractions for
> > > Volumes,
> > > > > > > > > that could aid future work in this area.
> > > > > > > > >
> > > > > > > > > But, I haven't directly been working on new FileSystem API
> > > > > > > > > abstraction... just trying to lay some groundwork for that
> > > > > > possibility
> > > > > > > > > in future.
> > > > > > > > >
> > > > > > > > > It'd be nice to get to a point where we have a
> Hadoop-specific
> > > > > > > > > implementation isolated to a jar that can be swapped out at
> > > runtime
> > > > > > > > > for other file system implementations, as needed. I see
> that
> > > as a
> > > > > > > > > somewhat long-way off.
> > > > > > > > >
> > > > > > > > > On Wed, Mar 25, 2020 at 2:08 PM 
> wrote:
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >   I couldn't make the call today, but am curious if
> anyone
> > > has
> > > > > > > > > previously brought up creating a FileSystem API for
> Accumulo so
> > > > > that
> > > > > > we
> > > > > > > > > could use implementations other than Hadoop. I realize that
> > > Hadoop
> > > > > > > > provides
> > > > > > > > > implementations for things other than HDFS but that doesn't
> > > > > > necessarily
> > > > > > > > > mean that all filesystem implementations are covered.
> > > > > > > > > >
> > > > > > > > > > -Original Message-
> > > > > > > > > > From: Christopher 
> > > > > > > > > > Sent: Wednesday, March 25, 2020 1:45 PM
> > > > > > > > > > To: accumulo-dev 
> > > > > > > > > > Subject: Slack call notes
> > > > > > > > > >
> > > > > > > > > > Several committers/contributors in the community joined a
> > > call in
> > > > > > Slack
> > > > > > > > > on Wednesday, at 1130-1230, New York (Eastern) time. Here

Re: FileSystem API (was: Slack call notes)

2020-04-28 Thread Christopher

 > already
> > > > > > > Apache Commons VFS that, with some investment, could probably
> > serve
> > > > all
> > > > > > > these requirements.
> > > > > > >
> > > > > > > On Wed, Mar 25, 2020, 3:16 PM Christopher 
> > > > wrote:
> > > > > > >
> > > > > > > > (Forking this thread, as it's a distinct topic)
> > > > > > > >
> > > > > > > > I've thought about it. The idea has driven me to try to reduce
> > our
> > > > > use
> > > > > > > > of Hadoop-specific code, and to isolate Hadoop-specific stuff
> > > > behind
> > > > > > > > some abstraction, wherever possible. Though, I'll admit, we're
> > > > > nowhere
> > > > > > > > close to where we'd want to be to be fully decoupled from
> > Hadoop.
> > > > > > > >
> > > > > > > > I've also been looking a lot at our VolumeManager code lately,
> > to
> > > > try
> > > > > > > > to improve it a bit, and to create better abstractions for
> > Volumes,
> > > > > > > > that could aid future work in this area.
> > > > > > > >
> > > > > > > > But, I haven't directly been working on new FileSystem API
> > > > > > > > abstraction... just trying to lay some groundwork for that
> > > > > possibility
> > > > > > > > in future.
> > > > > > > >
> > > > > > > > It'd be nice to get to a point where we have a Hadoop-specific
> > > > > > > > implementation isolated to a jar that can be swapped out at
> > runtime
> > > > > > > > for other file system implementations, as needed. I see that
> > as a
> > > > > > > > somewhat long-way off.
> > > > > > > >
> > > > > > > > On Wed, Mar 25, 2020 at 2:08 PM  wrote:
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >   I couldn't make the call today, but am curious if anyone
> > has
> > > > > > > > previously brought up creating a FileSystem API for Accumulo so
> > > > that
> > > > > we
> > > > > > > > could use implementations other than Hadoop. I realize that
> > Hadoop
> > > > > > > provides
> > > > > > > > implementations for things other than HDFS but that doesn't
> > > > > necessarily
> > > > > > > > mean that all filesystem implementations are covered.
> > > > > > > > >
> > > > > > > > > -Original Message-
> > > > > > > > > From: Christopher 
> > > > > > > > > Sent: Wednesday, March 25, 2020 1:45 PM
> > > > > > > > > To: accumulo-dev 
> > > > > > > > > Subject: Slack call notes
> > > > > > > > >
> > > > > > > > > Several committers/contributors in the community joined a
> > call in
> > > > > Slack
> > > > > > > > on Wednesday, at 1130-1230, New York (Eastern) time. Here are
> > my
> > > > > notes of
> > > > > > > > the call. Please feel free to add to them.
> > > > > > > > >
> > > > > > > > > I shared the overall philosophy and backstory to some of the
> > > > script
> > > > > > > > improvements in 2.x to help guide current/future work on the
> > > > scripts.
> > > > > > > > >
> > > > > > > > > * bin/accumulo is inspired by old jpackage.org standards
> > which
> > > > are
> > > > > > > > still in use in RPM macros for Java packaging in
> > Fedora/RHEL/etc.
> > > > > The key
> > > > > > > > idea is that scripts are simple... set up environment (class
> > path,
> > > > > etc.),
> > > > > > > > locate java, and exec a single process with the provided args.
> > > > > > > > > * bin/accumulo-service is inspired by old SysVInit scripts
> > for
> > > > > > > > start/stop/restart/status of a single service
> > > > > >

Re: FileSystem API (was: Slack call notes)

2020-04-28 Thread Michael Wall

gt; > > > > some abstraction, wherever possible. Though, I'll admit, we're
> > > > nowhere
> > > > > > > close to where we'd want to be to be fully decoupled from
> Hadoop.
> > > > > > >
> > > > > > > I've also been looking a lot at our VolumeManager code lately,
> to
> > > try
> > > > > > > to improve it a bit, and to create better abstractions for
> Volumes,
> > > > > > > that could aid future work in this area.
> > > > > > >
> > > > > > > But, I haven't directly been working on new FileSystem API
> > > > > > > abstraction... just trying to lay some groundwork for that
> > > > possibility
> > > > > > > in future.
> > > > > > >
> > > > > > > It'd be nice to get to a point where we have a Hadoop-specific
> > > > > > > implementation isolated to a jar that can be swapped out at
> runtime
> > > > > > > for other file system implementations, as needed. I see that
> as a
> > > > > > > somewhat long-way off.
> > > > > > >
> > > > > > > On Wed, Mar 25, 2020 at 2:08 PM  wrote:
> > > > > > > >
> > > > > > > >
> > > > > > > >   I couldn't make the call today, but am curious if anyone
> has
> > > > > > > previously brought up creating a FileSystem API for Accumulo so
> > > that
> > > > we
> > > > > > > could use implementations other than Hadoop. I realize that
> Hadoop
> > > > > > provides
> > > > > > > implementations for things other than HDFS but that doesn't
> > > > necessarily
> > > > > > > mean that all filesystem implementations are covered.
> > > > > > > >
> > > > > > > > -Original Message-
> > > > > > > > From: Christopher 
> > > > > > > > Sent: Wednesday, March 25, 2020 1:45 PM
> > > > > > > > To: accumulo-dev 
> > > > > > > > Subject: Slack call notes
> > > > > > > >
> > > > > > > > Several committers/contributors in the community joined a
> call in
> > > > Slack
> > > > > > > on Wednesday, at 1130-1230, New York (Eastern) time. Here are
> my
> > > > notes of
> > > > > > > the call. Please feel free to add to them.
> > > > > > > >
> > > > > > > > I shared the overall philosophy and backstory to some of the
> > > script
> > > > > > > improvements in 2.x to help guide current/future work on the
> > > scripts.
> > > > > > > >
> > > > > > > > * bin/accumulo is inspired by old jpackage.org standards
> which
> > > are
> > > > > > > still in use in RPM macros for Java packaging in
> Fedora/RHEL/etc.
> > > > The key
> > > > > > > idea is that scripts are simple... set up environment (class
> path,
> > > > etc.),
> > > > > > > locate java, and exec a single process with the provided args.
> > > > > > > > * bin/accumulo-service is inspired by old SysVInit scripts
> for
> > > > > > > start/stop/restart/status of a single service
> > > > > > > > * behavior of bin/accumulo and bin/accumulo-service can be
> > > > manipulated
> > > > > > > through launch environment
> > > > > > > > * bin/accumulo-cluster uses bin/accumulo-service, and is
> provided
> > > > as a
> > > > > > > simple, out-of-the-box cluster management tool
> > > > > > > > * bin/accumulo-cluster and bin/accumulo-service are
> replaceable;
> > > > they
> > > > > > > are useful for out-of-the-box, but one would expect them to be
> > > > > > unnecessary
> > > > > > > if using systemd, or a vendor-provided cluster management
> system
> > > > > > > > * we discussed possibly moving bin/accumulo-cluster and
> > > > > > > bin/accumulo-service to contrib/ in the tarball, or some
> subdir of
> > > > bin/,
> > > > > > > but it was suggested to not make too many disruptive changes
> there
> > > > > > > > * we discussed the possibility of adding a config file for
> > > > > > > bin/accumulo-cluster (also mentioned on
> > > > > > > > https://github.com/apache/accumulo/pull/1568)
> > > > > > > > * we discussed the need to document the intent/purpose/scope
> of
> > > the
> > > > > > > scripts in comments inside the scripts themselves
> > > > > > > > * Ed Coleman asked if it'd be good to document a systemd
> > > example; I
> > > > > > > suggested it might make for a good blog post (perhaps by the
> person
> > > > who
> > > > > > > wrote the systemd unit files for Fluo Muchos)
> > > > > > > >
> > > > > > > > Keith Turner discussed his development efforts with regard to
> > > > enabling
> > > > > > > more controls over compactions.
> > > > > > > >
> > > > > > > > * one main idea was to keep configuration/API for data
> separate
> > > > from
> > > > > > > that for execution
> > > > > > > > * data is concerns to application owners, whereas execution
> > > > involves
> > > > > > > system admins (resource contention, etc.)
> > > > > > > > * he will submit a PR for review when ready
> > > > > > > > * he also suggested another call to go over the PR
> > > > > > > >
> > > > > > > > Billie Rinaldi discussed better support for Azure Data Lake
> > > Storage
> > > > > > > > Gen2 (ADLSv2).
> > > > > > > >
> > > > > > > > * maintaining a fork for experimenting, and working on
> reliably
> > > > testing
> > > > > > > issues involving WALs
> > > > > > > > * did not recommend using ADLSv2 with WALs, but that we
> should
> > > > still
> > > > > > > support it
> > > > > > > > * might need to implement a custom log closer to better
> support
> > > it
> > > > > > > >
> > > > > > > > Mike Miller brought up the idea of eliminating more static
> > > internal
> > > > > > > state.
> > > > > > > >
> > > > > > > > * ServerConfigurationFactory might be improved in this
> regard,
> > > with
> > > > > > some
> > > > > > > additional ZK cleanup
> > > > > > > > * Other ZK cleanup might help elsewhere (such as ZooCache)
> > > > > > > > * I suggested tablet location cache might also benefit from
> being
> > > > bound
> > > > > > > to an AccumuloClient lifecycle (or a dedicated opaque object
> that
> > > > could
> > > > > > be
> > > > > > > shared across AccumuloClient instances with its own
> user-managed
> > > > > > lifecycle)
> > > > > > > >
> > > > > > > > Please add anything I might have missed (or got wrong) in
> > > response
> > > > to
> > > > > > > this post.
> > > > > > > >
> > > > > > >
> > > > > >
> > > >
> > >
>

Re: FileSystem API (was: Slack call notes)

2020-04-24 Thread Christopher

I'm not familiar with it, but the website says it can replace HDFS.
There appears to be an "HDFS Gateway"
(https://github.com/minio/minio/blob/master/docs/gateway/hdfs.md) that
might be useful. At a glance, it looks like no abstraction is needed
in Accumulo code is needed for it... you just run the gateway and
Accumulo believes it is using HDFS, but it is really using MinIO
instead.

There also might be a Hadoop FileSystem implementation for it to use
it directly without a Gateway, but I didn't have any luck with a quick
search for one.

In either case, there shouldn't need to be any changes to Accumulo itself.

If changes to Accumulo do become necessary (or desired), I'd be
interested in collaborating on that part. If it's just a matter of
trying it with the Gateway or existing Hadoop FileSystem
implementation, I'd also be interested in testing any step-by-step
HOWTO guides somebody might want to write as a blog post.

On Fri, Apr 24, 2020 at 11:20 AM Mike Miller  wrote:
>
> I have no experience with MinIO but would be interested in learning more
> and collaborating.
>
> On Fri, Apr 24, 2020 at 10:57 AM Michael Wall  wrote:
>
> > Resurrecting this thread on the File System API.  I have been thinking
> > about giving Minio [1] a try for both WALs and RFiles.  Seems to me like
> > step one is to abstract internal interfaces for both targeted against 2.1?
> > Couple of questions
> >
> > 1 - Anyone have experience with minio?
> > 2 - Anyone interested in collaborating?  Thinking anything from providing
> > input to helping to test once we get a prototype to actually doing some
> > development.
> >
> > Thanks, hope everyone is staying safe and healthy.
> >
> > [1] - https://min.io/
> >
> > On Wed, Mar 25, 2020 at 6:08 PM Christopher  wrote:
> >
> > > Only 705 across 280 files, if you exclude Text, though :)
> > >
> > > grep -rP 'org[.]apache[.]hadoop(?![.]io[.]Text)' --include='*.java' *
> > > | grep -v test/ | wc -l
> > >
> > > On Wed, Mar 25, 2020 at 3:34 PM Mike Miller  wrote:
> > > >
> > > > I think we have come a long way removing any external types from the
> > API,
> > > > for reasons other than de-coupling from Hadoop.  While we don't have
> > many
> > > > dependencies on the other components of Hadoop, we are still very
> > tightly
> > > > coupled to HDFS.
> > > > For example, some quick grep'ing of the code shows:
> > > > "grep -r "import org.apache.hadoop" --include=*.java * | wc -l"
> > > > 1734
> > > > Without tests it is slightly more feasible...
> > > > grep -r "import org.apache.hadoop" --include=*.java * | grep -v "test"
> > |
> > > wc
> > > > -l
> > > > 858
> > > >
> > > >
> > > > On Wed, Mar 25, 2020 at 3:19 PM David Mollitor 
> > > wrote:
> > > >
> > > > > Hello,
> > > > >
> > > > > I too have been thinking about this for a pet project.  There is
> > > already
> > > > > Apache Commons VFS that, with some investment, could probably serve
> > all
> > > > > these requirements.
> > > > >
> > > > > On Wed, Mar 25, 2020, 3:16 PM Christopher 
> > wrote:
> > > > >
> > > > > > (Forking this thread, as it's a distinct topic)
> > > > > >
> > > > > > I've thought about it. The idea has driven me to try to reduce our
> > > use
> > > > > > of Hadoop-specific code, and to isolate Hadoop-specific stuff
> > behind
> > > > > > some abstraction, wherever possible. Though, I'll admit, we're
> > > nowhere
> > > > > > close to where we'd want to be to be fully decoupled from Hadoop.
> > > > > >
> > > > > > I've also been looking a lot at our VolumeManager code lately, to
> > try
> > > > > > to improve it a bit, and to create better abstractions for Volumes,
> > > > > > that could aid future work in this area.
> > > > > >
> > > > > > But, I haven't directly been working on new FileSystem API
> > > > > > abstraction... just trying to lay some groundwork for that
> > > possibility
> > > > > > in future.
> > > > > >
> > > > > > It'd be nice to get to a point where we have a Hadoop-specific
> > > > > > implementation isolated to a jar that can be swapped out at runtime
> > &

Re: FileSystem API (was: Slack call notes)

2020-04-24 Thread Mike Miller

I have no experience with MinIO but would be interested in learning more
and collaborating.

On Fri, Apr 24, 2020 at 10:57 AM Michael Wall  wrote:

> Resurrecting this thread on the File System API.  I have been thinking
> about giving Minio [1] a try for both WALs and RFiles.  Seems to me like
> step one is to abstract internal interfaces for both targeted against 2.1?
> Couple of questions
>
> 1 - Anyone have experience with minio?
> 2 - Anyone interested in collaborating?  Thinking anything from providing
> input to helping to test once we get a prototype to actually doing some
> development.
>
> Thanks, hope everyone is staying safe and healthy.
>
> [1] - https://min.io/
>
> On Wed, Mar 25, 2020 at 6:08 PM Christopher  wrote:
>
> > Only 705 across 280 files, if you exclude Text, though :)
> >
> > grep -rP 'org[.]apache[.]hadoop(?![.]io[.]Text)' --include='*.java' *
> > | grep -v test/ | wc -l
> >
> > On Wed, Mar 25, 2020 at 3:34 PM Mike Miller  wrote:
> > >
> > > I think we have come a long way removing any external types from the
> API,
> > > for reasons other than de-coupling from Hadoop.  While we don't have
> many
> > > dependencies on the other components of Hadoop, we are still very
> tightly
> > > coupled to HDFS.
> > > For example, some quick grep'ing of the code shows:
> > > "grep -r "import org.apache.hadoop" --include=*.java * | wc -l"
> > > 1734
> > > Without tests it is slightly more feasible...
> > > grep -r "import org.apache.hadoop" --include=*.java * | grep -v "test"
> |
> > wc
> > > -l
> > > 858
> > >
> > >
> > > On Wed, Mar 25, 2020 at 3:19 PM David Mollitor 
> > wrote:
> > >
> > > > Hello,
> > > >
> > > > I too have been thinking about this for a pet project.  There is
> > already
> > > > Apache Commons VFS that, with some investment, could probably serve
> all
> > > > these requirements.
> > > >
> > > > On Wed, Mar 25, 2020, 3:16 PM Christopher 
> wrote:
> > > >
> > > > > (Forking this thread, as it's a distinct topic)
> > > > >
> > > > > I've thought about it. The idea has driven me to try to reduce our
> > use
> > > > > of Hadoop-specific code, and to isolate Hadoop-specific stuff
> behind
> > > > > some abstraction, wherever possible. Though, I'll admit, we're
> > nowhere
> > > > > close to where we'd want to be to be fully decoupled from Hadoop.
> > > > >
> > > > > I've also been looking a lot at our VolumeManager code lately, to
> try
> > > > > to improve it a bit, and to create better abstractions for Volumes,
> > > > > that could aid future work in this area.
> > > > >
> > > > > But, I haven't directly been working on new FileSystem API
> > > > > abstraction... just trying to lay some groundwork for that
> > possibility
> > > > > in future.
> > > > >
> > > > > It'd be nice to get to a point where we have a Hadoop-specific
> > > > > implementation isolated to a jar that can be swapped out at runtime
> > > > > for other file system implementations, as needed. I see that as a
> > > > > somewhat long-way off.
> > > > >
> > > > > On Wed, Mar 25, 2020 at 2:08 PM  wrote:
> > > > > >
> > > > > >
> > > > > >   I couldn't make the call today, but am curious if anyone has
> > > > > previously brought up creating a FileSystem API for Accumulo so
> that
> > we
> > > > > could use implementations other than Hadoop. I realize that Hadoop
> > > > provides
> > > > > implementations for things other than HDFS but that doesn't
> > necessarily
> > > > > mean that all filesystem implementations are covered.
> > > > > >
> > > > > > -Original Message-
> > > > > > From: Christopher 
> > > > > > Sent: Wednesday, March 25, 2020 1:45 PM
> > > > > > To: accumulo-dev 
> > > > > > Subject: Slack call notes
> > > > > >
> > > > > > Several committers/contributors in the community joined a call in
> > Slack
> > > > > on Wednesday, at 1130-1230, New York (Eastern) time. Here are my
> > notes of
> > > > > the call. Please feel free to add to them.
>

Re: FileSystem API (was: Slack call notes)

2020-04-24 Thread Michael Wall

Resurrecting this thread on the File System API.  I have been thinking
about giving Minio [1] a try for both WALs and RFiles.  Seems to me like
step one is to abstract internal interfaces for both targeted against 2.1?
Couple of questions

1 - Anyone have experience with minio?
2 - Anyone interested in collaborating?  Thinking anything from providing
input to helping to test once we get a prototype to actually doing some
development.

Thanks, hope everyone is staying safe and healthy.

[1] - https://min.io/

On Wed, Mar 25, 2020 at 6:08 PM Christopher  wrote:

> Only 705 across 280 files, if you exclude Text, though :)
>
> grep -rP 'org[.]apache[.]hadoop(?![.]io[.]Text)' --include='*.java' *
> | grep -v test/ | wc -l
>
> On Wed, Mar 25, 2020 at 3:34 PM Mike Miller  wrote:
> >
> > I think we have come a long way removing any external types from the API,
> > for reasons other than de-coupling from Hadoop.  While we don't have many
> > dependencies on the other components of Hadoop, we are still very tightly
> > coupled to HDFS.
> > For example, some quick grep'ing of the code shows:
> > "grep -r "import org.apache.hadoop" --include=*.java * | wc -l"
> > 1734
> > Without tests it is slightly more feasible...
> > grep -r "import org.apache.hadoop" --include=*.java * | grep -v "test" |
> wc
> > -l
> > 858
> >
> >
> > On Wed, Mar 25, 2020 at 3:19 PM David Mollitor 
> wrote:
> >
> > > Hello,
> > >
> > > I too have been thinking about this for a pet project.  There is
> already
> > > Apache Commons VFS that, with some investment, could probably serve all
> > > these requirements.
> > >
> > > On Wed, Mar 25, 2020, 3:16 PM Christopher  wrote:
> > >
> > > > (Forking this thread, as it's a distinct topic)
> > > >
> > > > I've thought about it. The idea has driven me to try to reduce our
> use
> > > > of Hadoop-specific code, and to isolate Hadoop-specific stuff behind
> > > > some abstraction, wherever possible. Though, I'll admit, we're
> nowhere
> > > > close to where we'd want to be to be fully decoupled from Hadoop.
> > > >
> > > > I've also been looking a lot at our VolumeManager code lately, to try
> > > > to improve it a bit, and to create better abstractions for Volumes,
> > > > that could aid future work in this area.
> > > >
> > > > But, I haven't directly been working on new FileSystem API
> > > > abstraction... just trying to lay some groundwork for that
> possibility
> > > > in future.
> > > >
> > > > It'd be nice to get to a point where we have a Hadoop-specific
> > > > implementation isolated to a jar that can be swapped out at runtime
> > > > for other file system implementations, as needed. I see that as a
> > > > somewhat long-way off.
> > > >
> > > > On Wed, Mar 25, 2020 at 2:08 PM  wrote:
> > > > >
> > > > >
> > > > >   I couldn't make the call today, but am curious if anyone has
> > > > previously brought up creating a FileSystem API for Accumulo so that
> we
> > > > could use implementations other than Hadoop. I realize that Hadoop
> > > provides
> > > > implementations for things other than HDFS but that doesn't
> necessarily
> > > > mean that all filesystem implementations are covered.
> > > > >
> > > > > -Original Message-
> > > > > From: Christopher 
> > > > > Sent: Wednesday, March 25, 2020 1:45 PM
> > > > > To: accumulo-dev 
> > > > > Subject: Slack call notes
> > > > >
> > > > > Several committers/contributors in the community joined a call in
> Slack
> > > > on Wednesday, at 1130-1230, New York (Eastern) time. Here are my
> notes of
> > > > the call. Please feel free to add to them.
> > > > >
> > > > > I shared the overall philosophy and backstory to some of the script
> > > > improvements in 2.x to help guide current/future work on the scripts.
> > > > >
> > > > > * bin/accumulo is inspired by old jpackage.org standards which are
> > > > still in use in RPM macros for Java packaging in Fedora/RHEL/etc.
> The key
> > > > idea is that scripts are simple... set up environment (class path,
> etc.),
> > > > locate java, and exec a single process with the provided args.
> > > > > * bin/accumulo-service is inspired by

Re: FileSystem API (was: Slack call notes)

2020-03-25 Thread Christopher

Only 705 across 280 files, if you exclude Text, though :)

grep -rP 'org[.]apache[.]hadoop(?![.]io[.]Text)' --include='*.java' *
| grep -v test/ | wc -l

On Wed, Mar 25, 2020 at 3:34 PM Mike Miller  wrote:
>
> I think we have come a long way removing any external types from the API,
> for reasons other than de-coupling from Hadoop.  While we don't have many
> dependencies on the other components of Hadoop, we are still very tightly
> coupled to HDFS.
> For example, some quick grep'ing of the code shows:
> "grep -r "import org.apache.hadoop" --include=*.java * | wc -l"
> 1734
> Without tests it is slightly more feasible...
> grep -r "import org.apache.hadoop" --include=*.java * | grep -v "test" | wc
> -l
> 858
>
>
> On Wed, Mar 25, 2020 at 3:19 PM David Mollitor  wrote:
>
> > Hello,
> >
> > I too have been thinking about this for a pet project.  There is already
> > Apache Commons VFS that, with some investment, could probably serve all
> > these requirements.
> >
> > On Wed, Mar 25, 2020, 3:16 PM Christopher  wrote:
> >
> > > (Forking this thread, as it's a distinct topic)
> > >
> > > I've thought about it. The idea has driven me to try to reduce our use
> > > of Hadoop-specific code, and to isolate Hadoop-specific stuff behind
> > > some abstraction, wherever possible. Though, I'll admit, we're nowhere
> > > close to where we'd want to be to be fully decoupled from Hadoop.
> > >
> > > I've also been looking a lot at our VolumeManager code lately, to try
> > > to improve it a bit, and to create better abstractions for Volumes,
> > > that could aid future work in this area.
> > >
> > > But, I haven't directly been working on new FileSystem API
> > > abstraction... just trying to lay some groundwork for that possibility
> > > in future.
> > >
> > > It'd be nice to get to a point where we have a Hadoop-specific
> > > implementation isolated to a jar that can be swapped out at runtime
> > > for other file system implementations, as needed. I see that as a
> > > somewhat long-way off.
> > >
> > > On Wed, Mar 25, 2020 at 2:08 PM  wrote:
> > > >
> > > >
> > > >   I couldn't make the call today, but am curious if anyone has
> > > previously brought up creating a FileSystem API for Accumulo so that we
> > > could use implementations other than Hadoop. I realize that Hadoop
> > provides
> > > implementations for things other than HDFS but that doesn't necessarily
> > > mean that all filesystem implementations are covered.
> > > >
> > > > -Original Message-
> > > > From: Christopher 
> > > > Sent: Wednesday, March 25, 2020 1:45 PM
> > > > To: accumulo-dev 
> > > > Subject: Slack call notes
> > > >
> > > > Several committers/contributors in the community joined a call in Slack
> > > on Wednesday, at 1130-1230, New York (Eastern) time. Here are my notes of
> > > the call. Please feel free to add to them.
> > > >
> > > > I shared the overall philosophy and backstory to some of the script
> > > improvements in 2.x to help guide current/future work on the scripts.
> > > >
> > > > * bin/accumulo is inspired by old jpackage.org standards which are
> > > still in use in RPM macros for Java packaging in Fedora/RHEL/etc. The key
> > > idea is that scripts are simple... set up environment (class path, etc.),
> > > locate java, and exec a single process with the provided args.
> > > > * bin/accumulo-service is inspired by old SysVInit scripts for
> > > start/stop/restart/status of a single service
> > > > * behavior of bin/accumulo and bin/accumulo-service can be manipulated
> > > through launch environment
> > > > * bin/accumulo-cluster uses bin/accumulo-service, and is provided as a
> > > simple, out-of-the-box cluster management tool
> > > > * bin/accumulo-cluster and bin/accumulo-service are replaceable; they
> > > are useful for out-of-the-box, but one would expect them to be
> > unnecessary
> > > if using systemd, or a vendor-provided cluster management system
> > > > * we discussed possibly moving bin/accumulo-cluster and
> > > bin/accumulo-service to contrib/ in the tarball, or some subdir of bin/,
> > > but it was suggested to not make too many disruptive changes there
> > > > * we discussed the possibility of adding a config file for
> > > bin/accumulo-cluster (also mentio

Re: FileSystem API (was: Slack call notes)

2020-03-25 Thread Mike Miller

I think we have come a long way removing any external types from the API,
for reasons other than de-coupling from Hadoop.  While we don't have many
dependencies on the other components of Hadoop, we are still very tightly
coupled to HDFS.
For example, some quick grep'ing of the code shows:
"grep -r "import org.apache.hadoop" --include=*.java * | wc -l"
1734
Without tests it is slightly more feasible...
grep -r "import org.apache.hadoop" --include=*.java * | grep -v "test" | wc
-l
858


On Wed, Mar 25, 2020 at 3:19 PM David Mollitor  wrote:

> Hello,
>
> I too have been thinking about this for a pet project.  There is already
> Apache Commons VFS that, with some investment, could probably serve all
> these requirements.
>
> On Wed, Mar 25, 2020, 3:16 PM Christopher  wrote:
>
> > (Forking this thread, as it's a distinct topic)
> >
> > I've thought about it. The idea has driven me to try to reduce our use
> > of Hadoop-specific code, and to isolate Hadoop-specific stuff behind
> > some abstraction, wherever possible. Though, I'll admit, we're nowhere
> > close to where we'd want to be to be fully decoupled from Hadoop.
> >
> > I've also been looking a lot at our VolumeManager code lately, to try
> > to improve it a bit, and to create better abstractions for Volumes,
> > that could aid future work in this area.
> >
> > But, I haven't directly been working on new FileSystem API
> > abstraction... just trying to lay some groundwork for that possibility
> > in future.
> >
> > It'd be nice to get to a point where we have a Hadoop-specific
> > implementation isolated to a jar that can be swapped out at runtime
> > for other file system implementations, as needed. I see that as a
> > somewhat long-way off.
> >
> > On Wed, Mar 25, 2020 at 2:08 PM  wrote:
> > >
> > >
> > >   I couldn't make the call today, but am curious if anyone has
> > previously brought up creating a FileSystem API for Accumulo so that we
> > could use implementations other than Hadoop. I realize that Hadoop
> provides
> > implementations for things other than HDFS but that doesn't necessarily
> > mean that all filesystem implementations are covered.
> > >
> > > -Original Message-
> > > From: Christopher 
> > > Sent: Wednesday, March 25, 2020 1:45 PM
> > > To: accumulo-dev 
> > > Subject: Slack call notes
> > >
> > > Several committers/contributors in the community joined a call in Slack
> > on Wednesday, at 1130-1230, New York (Eastern) time. Here are my notes of
> > the call. Please feel free to add to them.
> > >
> > > I shared the overall philosophy and backstory to some of the script
> > improvements in 2.x to help guide current/future work on the scripts.
> > >
> > > * bin/accumulo is inspired by old jpackage.org standards which are
> > still in use in RPM macros for Java packaging in Fedora/RHEL/etc. The key
> > idea is that scripts are simple... set up environment (class path, etc.),
> > locate java, and exec a single process with the provided args.
> > > * bin/accumulo-service is inspired by old SysVInit scripts for
> > start/stop/restart/status of a single service
> > > * behavior of bin/accumulo and bin/accumulo-service can be manipulated
> > through launch environment
> > > * bin/accumulo-cluster uses bin/accumulo-service, and is provided as a
> > simple, out-of-the-box cluster management tool
> > > * bin/accumulo-cluster and bin/accumulo-service are replaceable; they
> > are useful for out-of-the-box, but one would expect them to be
> unnecessary
> > if using systemd, or a vendor-provided cluster management system
> > > * we discussed possibly moving bin/accumulo-cluster and
> > bin/accumulo-service to contrib/ in the tarball, or some subdir of bin/,
> > but it was suggested to not make too many disruptive changes there
> > > * we discussed the possibility of adding a config file for
> > bin/accumulo-cluster (also mentioned on
> > > https://github.com/apache/accumulo/pull/1568)
> > > * we discussed the need to document the intent/purpose/scope of the
> > scripts in comments inside the scripts themselves
> > > * Ed Coleman asked if it'd be good to document a systemd example; I
> > suggested it might make for a good blog post (perhaps by the person who
> > wrote the systemd unit files for Fluo Muchos)
> > >
> > > Keith Turner discussed his development efforts with regard to enabling
> > more controls over compactions.
> > >
> > > * one main idea was to ke

Re: FileSystem API (was: Slack call notes)

2020-03-25 Thread David Mollitor

Hello,

I too have been thinking about this for a pet project.  There is already
Apache Commons VFS that, with some investment, could probably serve all
these requirements.

On Wed, Mar 25, 2020, 3:16 PM Christopher  wrote:

> (Forking this thread, as it's a distinct topic)
>
> I've thought about it. The idea has driven me to try to reduce our use
> of Hadoop-specific code, and to isolate Hadoop-specific stuff behind
> some abstraction, wherever possible. Though, I'll admit, we're nowhere
> close to where we'd want to be to be fully decoupled from Hadoop.
>
> I've also been looking a lot at our VolumeManager code lately, to try
> to improve it a bit, and to create better abstractions for Volumes,
> that could aid future work in this area.
>
> But, I haven't directly been working on new FileSystem API
> abstraction... just trying to lay some groundwork for that possibility
> in future.
>
> It'd be nice to get to a point where we have a Hadoop-specific
> implementation isolated to a jar that can be swapped out at runtime
> for other file system implementations, as needed. I see that as a
> somewhat long-way off.
>
> On Wed, Mar 25, 2020 at 2:08 PM  wrote:
> >
> >
> >   I couldn't make the call today, but am curious if anyone has
> previously brought up creating a FileSystem API for Accumulo so that we
> could use implementations other than Hadoop. I realize that Hadoop provides
> implementations for things other than HDFS but that doesn't necessarily
> mean that all filesystem implementations are covered.
> >
> > -Original Message-----
> > From: Christopher 
> > Sent: Wednesday, March 25, 2020 1:45 PM
> > To: accumulo-dev 
> > Subject: Slack call notes
> >
> > Several committers/contributors in the community joined a call in Slack
> on Wednesday, at 1130-1230, New York (Eastern) time. Here are my notes of
> the call. Please feel free to add to them.
> >
> > I shared the overall philosophy and backstory to some of the script
> improvements in 2.x to help guide current/future work on the scripts.
> >
> > * bin/accumulo is inspired by old jpackage.org standards which are
> still in use in RPM macros for Java packaging in Fedora/RHEL/etc. The key
> idea is that scripts are simple... set up environment (class path, etc.),
> locate java, and exec a single process with the provided args.
> > * bin/accumulo-service is inspired by old SysVInit scripts for
> start/stop/restart/status of a single service
> > * behavior of bin/accumulo and bin/accumulo-service can be manipulated
> through launch environment
> > * bin/accumulo-cluster uses bin/accumulo-service, and is provided as a
> simple, out-of-the-box cluster management tool
> > * bin/accumulo-cluster and bin/accumulo-service are replaceable; they
> are useful for out-of-the-box, but one would expect them to be unnecessary
> if using systemd, or a vendor-provided cluster management system
> > * we discussed possibly moving bin/accumulo-cluster and
> bin/accumulo-service to contrib/ in the tarball, or some subdir of bin/,
> but it was suggested to not make too many disruptive changes there
> > * we discussed the possibility of adding a config file for
> bin/accumulo-cluster (also mentioned on
> > https://github.com/apache/accumulo/pull/1568)
> > * we discussed the need to document the intent/purpose/scope of the
> scripts in comments inside the scripts themselves
> > * Ed Coleman asked if it'd be good to document a systemd example; I
> suggested it might make for a good blog post (perhaps by the person who
> wrote the systemd unit files for Fluo Muchos)
> >
> > Keith Turner discussed his development efforts with regard to enabling
> more controls over compactions.
> >
> > * one main idea was to keep configuration/API for data separate from
> that for execution
> > * data is concerns to application owners, whereas execution involves
> system admins (resource contention, etc.)
> > * he will submit a PR for review when ready
> > * he also suggested another call to go over the PR
> >
> > Billie Rinaldi discussed better support for Azure Data Lake Storage
> > Gen2 (ADLSv2).
> >
> > * maintaining a fork for experimenting, and working on reliably testing
> issues involving WALs
> > * did not recommend using ADLSv2 with WALs, but that we should still
> support it
> > * might need to implement a custom log closer to better support it
> >
> > Mike Miller brought up the idea of eliminating more static internal
> state.
> >
> > * ServerConfigurationFactory might be improved in this regard, with some
> additional ZK cleanup
> > * Other ZK cleanup might help elsewhere (such as ZooCache)
> > * I suggested tablet location cache might also benefit from being bound
> to an AccumuloClient lifecycle (or a dedicated opaque object that could be
> shared across AccumuloClient instances with its own user-managed lifecycle)
> >
> > Please add anything I might have missed (or got wrong) in response to
> this post.
> >
>

Re: Slack call notes

2020-03-25 Thread Christopher

Replied in a new thread.

On Wed, Mar 25, 2020 at 2:08 PM  wrote:
>
>
>   I couldn't make the call today, but am curious if anyone has previously 
> brought up creating a FileSystem API for Accumulo so that we could use 
> implementations other than Hadoop. I realize that Hadoop provides 
> implementations for things other than HDFS but that doesn't necessarily mean 
> that all filesystem implementations are covered.
>
> -Original Message-
> From: Christopher 
> Sent: Wednesday, March 25, 2020 1:45 PM
> To: accumulo-dev 
> Subject: Slack call notes
>
> Several committers/contributors in the community joined a call in Slack on 
> Wednesday, at 1130-1230, New York (Eastern) time. Here are my notes of the 
> call. Please feel free to add to them.
>
> I shared the overall philosophy and backstory to some of the script 
> improvements in 2.x to help guide current/future work on the scripts.
>
> * bin/accumulo is inspired by old jpackage.org standards which are still in 
> use in RPM macros for Java packaging in Fedora/RHEL/etc. The key idea is that 
> scripts are simple... set up environment (class path, etc.), locate java, and 
> exec a single process with the provided args.
> * bin/accumulo-service is inspired by old SysVInit scripts for 
> start/stop/restart/status of a single service
> * behavior of bin/accumulo and bin/accumulo-service can be manipulated 
> through launch environment
> * bin/accumulo-cluster uses bin/accumulo-service, and is provided as a 
> simple, out-of-the-box cluster management tool
> * bin/accumulo-cluster and bin/accumulo-service are replaceable; they are 
> useful for out-of-the-box, but one would expect them to be unnecessary if 
> using systemd, or a vendor-provided cluster management system
> * we discussed possibly moving bin/accumulo-cluster and bin/accumulo-service 
> to contrib/ in the tarball, or some subdir of bin/, but it was suggested to 
> not make too many disruptive changes there
> * we discussed the possibility of adding a config file for 
> bin/accumulo-cluster (also mentioned on
> https://github.com/apache/accumulo/pull/1568)
> * we discussed the need to document the intent/purpose/scope of the scripts 
> in comments inside the scripts themselves
> * Ed Coleman asked if it'd be good to document a systemd example; I suggested 
> it might make for a good blog post (perhaps by the person who wrote the 
> systemd unit files for Fluo Muchos)
>
> Keith Turner discussed his development efforts with regard to enabling more 
> controls over compactions.
>
> * one main idea was to keep configuration/API for data separate from that for 
> execution
> * data is concerns to application owners, whereas execution involves system 
> admins (resource contention, etc.)
> * he will submit a PR for review when ready
> * he also suggested another call to go over the PR
>
> Billie Rinaldi discussed better support for Azure Data Lake Storage
> Gen2 (ADLSv2).
>
> * maintaining a fork for experimenting, and working on reliably testing 
> issues involving WALs
> * did not recommend using ADLSv2 with WALs, but that we should still support 
> it
> * might need to implement a custom log closer to better support it
>
> Mike Miller brought up the idea of eliminating more static internal state.
>
> * ServerConfigurationFactory might be improved in this regard, with some 
> additional ZK cleanup
> * Other ZK cleanup might help elsewhere (such as ZooCache)
> * I suggested tablet location cache might also benefit from being bound to an 
> AccumuloClient lifecycle (or a dedicated opaque object that could be shared 
> across AccumuloClient instances with its own user-managed lifecycle)
>
> Please add anything I might have missed (or got wrong) in response to this 
> post.
>

FileSystem API (was: Slack call notes)

2020-03-25 Thread Christopher

(Forking this thread, as it's a distinct topic)

I've thought about it. The idea has driven me to try to reduce our use
of Hadoop-specific code, and to isolate Hadoop-specific stuff behind
some abstraction, wherever possible. Though, I'll admit, we're nowhere
close to where we'd want to be to be fully decoupled from Hadoop.

I've also been looking a lot at our VolumeManager code lately, to try
to improve it a bit, and to create better abstractions for Volumes,
that could aid future work in this area.

But, I haven't directly been working on new FileSystem API
abstraction... just trying to lay some groundwork for that possibility
in future.

It'd be nice to get to a point where we have a Hadoop-specific
implementation isolated to a jar that can be swapped out at runtime
for other file system implementations, as needed. I see that as a
somewhat long-way off.

On Wed, Mar 25, 2020 at 2:08 PM  wrote:
>
>
>   I couldn't make the call today, but am curious if anyone has previously 
> brought up creating a FileSystem API for Accumulo so that we could use 
> implementations other than Hadoop. I realize that Hadoop provides 
> implementations for things other than HDFS but that doesn't necessarily mean 
> that all filesystem implementations are covered.
>
> -Original Message-
> From: Christopher 
> Sent: Wednesday, March 25, 2020 1:45 PM
> To: accumulo-dev 
> Subject: Slack call notes
>
> Several committers/contributors in the community joined a call in Slack on 
> Wednesday, at 1130-1230, New York (Eastern) time. Here are my notes of the 
> call. Please feel free to add to them.
>
> I shared the overall philosophy and backstory to some of the script 
> improvements in 2.x to help guide current/future work on the scripts.
>
> * bin/accumulo is inspired by old jpackage.org standards which are still in 
> use in RPM macros for Java packaging in Fedora/RHEL/etc. The key idea is that 
> scripts are simple... set up environment (class path, etc.), locate java, and 
> exec a single process with the provided args.
> * bin/accumulo-service is inspired by old SysVInit scripts for 
> start/stop/restart/status of a single service
> * behavior of bin/accumulo and bin/accumulo-service can be manipulated 
> through launch environment
> * bin/accumulo-cluster uses bin/accumulo-service, and is provided as a 
> simple, out-of-the-box cluster management tool
> * bin/accumulo-cluster and bin/accumulo-service are replaceable; they are 
> useful for out-of-the-box, but one would expect them to be unnecessary if 
> using systemd, or a vendor-provided cluster management system
> * we discussed possibly moving bin/accumulo-cluster and bin/accumulo-service 
> to contrib/ in the tarball, or some subdir of bin/, but it was suggested to 
> not make too many disruptive changes there
> * we discussed the possibility of adding a config file for 
> bin/accumulo-cluster (also mentioned on
> https://github.com/apache/accumulo/pull/1568)
> * we discussed the need to document the intent/purpose/scope of the scripts 
> in comments inside the scripts themselves
> * Ed Coleman asked if it'd be good to document a systemd example; I suggested 
> it might make for a good blog post (perhaps by the person who wrote the 
> systemd unit files for Fluo Muchos)
>
> Keith Turner discussed his development efforts with regard to enabling more 
> controls over compactions.
>
> * one main idea was to keep configuration/API for data separate from that for 
> execution
> * data is concerns to application owners, whereas execution involves system 
> admins (resource contention, etc.)
> * he will submit a PR for review when ready
> * he also suggested another call to go over the PR
>
> Billie Rinaldi discussed better support for Azure Data Lake Storage
> Gen2 (ADLSv2).
>
> * maintaining a fork for experimenting, and working on reliably testing 
> issues involving WALs
> * did not recommend using ADLSv2 with WALs, but that we should still support 
> it
> * might need to implement a custom log closer to better support it
>
> Mike Miller brought up the idea of eliminating more static internal state.
>
> * ServerConfigurationFactory might be improved in this regard, with some 
> additional ZK cleanup
> * Other ZK cleanup might help elsewhere (such as ZooCache)
> * I suggested tablet location cache might also benefit from being bound to an 
> AccumuloClient lifecycle (or a dedicated opaque object that could be shared 
> across AccumuloClient instances with its own user-managed lifecycle)
>
> Please add anything I might have missed (or got wrong) in response to this 
> post.
>

RE: Slack call notes

2020-03-25 Thread dlmarion



  I couldn't make the call today, but am curious if anyone has previously 
brought up creating a FileSystem API for Accumulo so that we could use 
implementations other than Hadoop. I realize that Hadoop provides 
implementations for things other than HDFS but that doesn't necessarily mean 
that all filesystem implementations are covered.

-Original Message-
From: Christopher  
Sent: Wednesday, March 25, 2020 1:45 PM
To: accumulo-dev 
Subject: Slack call notes

Several committers/contributors in the community joined a call in Slack on 
Wednesday, at 1130-1230, New York (Eastern) time. Here are my notes of the 
call. Please feel free to add to them.

I shared the overall philosophy and backstory to some of the script 
improvements in 2.x to help guide current/future work on the scripts.

* bin/accumulo is inspired by old jpackage.org standards which are still in use 
in RPM macros for Java packaging in Fedora/RHEL/etc. The key idea is that 
scripts are simple... set up environment (class path, etc.), locate java, and 
exec a single process with the provided args.
* bin/accumulo-service is inspired by old SysVInit scripts for 
start/stop/restart/status of a single service
* behavior of bin/accumulo and bin/accumulo-service can be manipulated through 
launch environment
* bin/accumulo-cluster uses bin/accumulo-service, and is provided as a simple, 
out-of-the-box cluster management tool
* bin/accumulo-cluster and bin/accumulo-service are replaceable; they are 
useful for out-of-the-box, but one would expect them to be unnecessary if using 
systemd, or a vendor-provided cluster management system
* we discussed possibly moving bin/accumulo-cluster and bin/accumulo-service to 
contrib/ in the tarball, or some subdir of bin/, but it was suggested to not 
make too many disruptive changes there
* we discussed the possibility of adding a config file for bin/accumulo-cluster 
(also mentioned on
https://github.com/apache/accumulo/pull/1568)
* we discussed the need to document the intent/purpose/scope of the scripts in 
comments inside the scripts themselves
* Ed Coleman asked if it'd be good to document a systemd example; I suggested 
it might make for a good blog post (perhaps by the person who wrote the systemd 
unit files for Fluo Muchos)

Keith Turner discussed his development efforts with regard to enabling more 
controls over compactions.

* one main idea was to keep configuration/API for data separate from that for 
execution
* data is concerns to application owners, whereas execution involves system 
admins (resource contention, etc.)
* he will submit a PR for review when ready
* he also suggested another call to go over the PR

Billie Rinaldi discussed better support for Azure Data Lake Storage
Gen2 (ADLSv2).

* maintaining a fork for experimenting, and working on reliably testing issues 
involving WALs
* did not recommend using ADLSv2 with WALs, but that we should still support it
* might need to implement a custom log closer to better support it

Mike Miller brought up the idea of eliminating more static internal state.

* ServerConfigurationFactory might be improved in this regard, with some 
additional ZK cleanup
* Other ZK cleanup might help elsewhere (such as ZooCache)
* I suggested tablet location cache might also benefit from being bound to an 
AccumuloClient lifecycle (or a dedicated opaque object that could be shared 
across AccumuloClient instances with its own user-managed lifecycle)

Please add anything I might have missed (or got wrong) in response to this post.

Slack call notes

2020-03-25 Thread Christopher

Several committers/contributors in the community joined a call in
Slack on Wednesday, at 1130-1230, New York (Eastern) time. Here are my
notes of the call. Please feel free to add to them.

I shared the overall philosophy and backstory to some of the script
improvements in 2.x to help guide current/future work on the scripts.

* bin/accumulo is inspired by old jpackage.org standards which are
still in use in RPM macros for Java packaging in Fedora/RHEL/etc. The
key idea is that scripts are simple... set up environment (class path,
etc.), locate java, and exec a single process with the provided args.
* bin/accumulo-service is inspired by old SysVInit scripts for
start/stop/restart/status of a single service
* behavior of bin/accumulo and bin/accumulo-service can be manipulated
through launch environment
* bin/accumulo-cluster uses bin/accumulo-service, and is provided as a
simple, out-of-the-box cluster management tool
* bin/accumulo-cluster and bin/accumulo-service are replaceable; they
are useful for out-of-the-box, but one would expect them to be
unnecessary if using systemd, or a vendor-provided cluster management
system
* we discussed possibly moving bin/accumulo-cluster and
bin/accumulo-service to contrib/ in the tarball, or some subdir of
bin/, but it was suggested to not make too many disruptive changes
there
* we discussed the possibility of adding a config file for
bin/accumulo-cluster (also mentioned on
https://github.com/apache/accumulo/pull/1568)
* we discussed the need to document the intent/purpose/scope of the
scripts in comments inside the scripts themselves
* Ed Coleman asked if it'd be good to document a systemd example; I
suggested it might make for a good blog post (perhaps by the person
who wrote the systemd unit files for Fluo Muchos)

Keith Turner discussed his development efforts with regard to enabling
more controls over compactions.

* one main idea was to keep configuration/API for data separate from
that for execution
* data is concerns to application owners, whereas execution involves
system admins (resource contention, etc.)
* he will submit a PR for review when ready
* he also suggested another call to go over the PR

Billie Rinaldi discussed better support for Azure Data Lake Storage
Gen2 (ADLSv2).

* maintaining a fork for experimenting, and working on reliably
testing issues involving WALs
* did not recommend using ADLSv2 with WALs, but that we should still support it
* might need to implement a custom log closer to better support it

Mike Miller brought up the idea of eliminating more static internal state.

* ServerConfigurationFactory might be improved in this regard, with
some additional ZK cleanup
* Other ZK cleanup might help elsewhere (such as ZooCache)
* I suggested tablet location cache might also benefit from being
bound to an AccumuloClient lifecycle (or a dedicated opaque object
that could be shared across AccumuloClient instances with its own
user-managed lifecycle)

Please add anything I might have missed (or got wrong) in response to this post.

Re: FileSystem API (was: Slack call notes)

Re: FileSystem API (was: Slack call notes)

Re: FileSystem API (was: Slack call notes)

Re: FileSystem API (was: Slack call notes)

Re: FileSystem API (was: Slack call notes)

Re: FileSystem API (was: Slack call notes)

Re: FileSystem API (was: Slack call notes)

Re: FileSystem API (was: Slack call notes)

Re: FileSystem API (was: Slack call notes)

Re: Slack call notes

FileSystem API (was: Slack call notes)

RE: Slack call notes

Slack call notes

13 matches

Site Navigation

Mail list logo

Footer information