Yeah, I agree the docs are not clear. I didn't actually run it myself, just took a peek at the source at https://github.com/minio/minio/blob/master/cmd/gateway/hdfs/gateway-hdfs.go
On Tue, Apr 28, 2020 at 9:43 PM Christopher <ctubb...@apache.org> wrote: > The page seemed to describe an S3 gateway, as well as a separate HDFS > gateway, but I only took a superficial reading of the docs, and did > not look at the code at all. > > On Tue, Apr 28, 2020 at 4:19 PM Michael Wall <mjw...@gmail.com> wrote: > > > > That HDFS gateway appears to be an S3 layer on top of HDFS, not and HDFS > > layer on top of S3/Minio. It allows you to write code to use Minio and > > pull existing data from HDFS as you migrate it into Minio. As far as I > can > > tell, it would not work without changes to Accumulo. > > > > In the next week or so I'll look at actually putting interfaces around > the > > HDFS interactions for RFiles and WALs as a first step. I will report > back > > with my findings and hopefully some code. > > > > Thanks > > > > Mike > > > > On Fri, Apr 24, 2020 at 10:32 PM Christopher <ctubb...@apache.org> > wrote: > > > > > I'm not familiar with it, but the website says it can replace HDFS. > > > There appears to be an "HDFS Gateway" > > > (https://github.com/minio/minio/blob/master/docs/gateway/hdfs.md) that > > > might be useful. At a glance, it looks like no abstraction is needed > > > in Accumulo code is needed for it... you just run the gateway and > > > Accumulo believes it is using HDFS, but it is really using MinIO > > > instead. > > > > > > There also might be a Hadoop FileSystem implementation for it to use > > > it directly without a Gateway, but I didn't have any luck with a quick > > > search for one. > > > > > > In either case, there shouldn't need to be any changes to Accumulo > itself. > > > > > > If changes to Accumulo do become necessary (or desired), I'd be > > > interested in collaborating on that part. If it's just a matter of > > > trying it with the Gateway or existing Hadoop FileSystem > > > implementation, I'd also be interested in testing any step-by-step > > > HOWTO guides somebody might want to write as a blog post. > > > > > > On Fri, Apr 24, 2020 at 11:20 AM Mike Miller <mmil...@apache.org> > wrote: > > > > > > > > I have no experience with MinIO but would be interested in learning > more > > > > and collaborating. > > > > > > > > On Fri, Apr 24, 2020 at 10:57 AM Michael Wall <mjw...@apache.org> > wrote: > > > > > > > > > Resurrecting this thread on the File System API. I have been > thinking > > > > > about giving Minio [1] a try for both WALs and RFiles. Seems to me > > > like > > > > > step one is to abstract internal interfaces for both targeted > against > > > 2.1? > > > > > Couple of questions > > > > > > > > > > 1 - Anyone have experience with minio? > > > > > 2 - Anyone interested in collaborating? Thinking anything from > > > providing > > > > > input to helping to test once we get a prototype to actually doing > some > > > > > development. > > > > > > > > > > Thanks, hope everyone is staying safe and healthy. > > > > > > > > > > [1] - https://min.io/ > > > > > > > > > > On Wed, Mar 25, 2020 at 6:08 PM Christopher <ctubb...@apache.org> > > > wrote: > > > > > > > > > > > Only 705 across 280 files, if you exclude Text, though :) > > > > > > > > > > > > grep -rP 'org[.]apache[.]hadoop(?![.]io[.]Text)' > --include='*.java' * > > > > > > | grep -v test/ | wc -l > > > > > > > > > > > > On Wed, Mar 25, 2020 at 3:34 PM Mike Miller <mmil...@apache.org> > > > wrote: > > > > > > > > > > > > > > I think we have come a long way removing any external types > from > > > the > > > > > API, > > > > > > > for reasons other than de-coupling from Hadoop. While we don't > > > have > > > > > many > > > > > > > dependencies on the other components of Hadoop, we are still > very > > > > > tightly > > > > > > > coupled to HDFS. > > > > > > > For example, some quick grep'ing of the code shows: > > > > > > > "grep -r "import org.apache.hadoop" --include=*.java * | wc -l" > > > > > > > 1734 > > > > > > > Without tests it is slightly more feasible... > > > > > > > grep -r "import org.apache.hadoop" --include=*.java * | grep -v > > > "test" > > > > > | > > > > > > wc > > > > > > > -l > > > > > > > 858 > > > > > > > > > > > > > > > > > > > > > On Wed, Mar 25, 2020 at 3:19 PM David Mollitor < > dam6...@gmail.com> > > > > > > wrote: > > > > > > > > > > > > > > > Hello, > > > > > > > > > > > > > > > > I too have been thinking about this for a pet project. > There is > > > > > > already > > > > > > > > Apache Commons VFS that, with some investment, could probably > > > serve > > > > > all > > > > > > > > these requirements. > > > > > > > > > > > > > > > > On Wed, Mar 25, 2020, 3:16 PM Christopher < > ctubb...@apache.org> > > > > > wrote: > > > > > > > > > > > > > > > > > (Forking this thread, as it's a distinct topic) > > > > > > > > > > > > > > > > > > I've thought about it. The idea has driven me to try to > reduce > > > our > > > > > > use > > > > > > > > > of Hadoop-specific code, and to isolate Hadoop-specific > stuff > > > > > behind > > > > > > > > > some abstraction, wherever possible. Though, I'll admit, > we're > > > > > > nowhere > > > > > > > > > close to where we'd want to be to be fully decoupled from > > > Hadoop. > > > > > > > > > > > > > > > > > > I've also been looking a lot at our VolumeManager code > lately, > > > to > > > > > try > > > > > > > > > to improve it a bit, and to create better abstractions for > > > Volumes, > > > > > > > > > that could aid future work in this area. > > > > > > > > > > > > > > > > > > But, I haven't directly been working on new FileSystem API > > > > > > > > > abstraction... just trying to lay some groundwork for that > > > > > > possibility > > > > > > > > > in future. > > > > > > > > > > > > > > > > > > It'd be nice to get to a point where we have a > Hadoop-specific > > > > > > > > > implementation isolated to a jar that can be swapped out at > > > runtime > > > > > > > > > for other file system implementations, as needed. I see > that > > > as a > > > > > > > > > somewhat long-way off. > > > > > > > > > > > > > > > > > > On Wed, Mar 25, 2020 at 2:08 PM <dlmar...@comcast.net> > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I couldn't make the call today, but am curious if > anyone > > > has > > > > > > > > > previously brought up creating a FileSystem API for > Accumulo so > > > > > that > > > > > > we > > > > > > > > > could use implementations other than Hadoop. I realize that > > > Hadoop > > > > > > > > provides > > > > > > > > > implementations for things other than HDFS but that doesn't > > > > > > necessarily > > > > > > > > > mean that all filesystem implementations are covered. > > > > > > > > > > > > > > > > > > > > -----Original Message----- > > > > > > > > > > From: Christopher <ctubb...@apache.org> > > > > > > > > > > Sent: Wednesday, March 25, 2020 1:45 PM > > > > > > > > > > To: accumulo-dev <dev@accumulo.apache.org> > > > > > > > > > > Subject: Slack call notes > > > > > > > > > > > > > > > > > > > > Several committers/contributors in the community joined a > > > call in > > > > > > Slack > > > > > > > > > on Wednesday, at 1130-1230, New York (Eastern) time. Here > are > > > my > > > > > > notes of > > > > > > > > > the call. Please feel free to add to them. > > > > > > > > > > > > > > > > > > > > I shared the overall philosophy and backstory to some of > the > > > > > script > > > > > > > > > improvements in 2.x to help guide current/future work on > the > > > > > scripts. > > > > > > > > > > > > > > > > > > > > * bin/accumulo is inspired by old jpackage.org standards > > > which > > > > > are > > > > > > > > > still in use in RPM macros for Java packaging in > > > Fedora/RHEL/etc. > > > > > > The key > > > > > > > > > idea is that scripts are simple... set up environment > (class > > > path, > > > > > > etc.), > > > > > > > > > locate java, and exec a single process with the provided > args. > > > > > > > > > > * bin/accumulo-service is inspired by old SysVInit > scripts > > > for > > > > > > > > > start/stop/restart/status of a single service > > > > > > > > > > * behavior of bin/accumulo and bin/accumulo-service can > be > > > > > > manipulated > > > > > > > > > through launch environment > > > > > > > > > > * bin/accumulo-cluster uses bin/accumulo-service, and is > > > provided > > > > > > as a > > > > > > > > > simple, out-of-the-box cluster management tool > > > > > > > > > > * bin/accumulo-cluster and bin/accumulo-service are > > > replaceable; > > > > > > they > > > > > > > > > are useful for out-of-the-box, but one would expect them > to be > > > > > > > > unnecessary > > > > > > > > > if using systemd, or a vendor-provided cluster management > > > system > > > > > > > > > > * we discussed possibly moving bin/accumulo-cluster and > > > > > > > > > bin/accumulo-service to contrib/ in the tarball, or some > > > subdir of > > > > > > bin/, > > > > > > > > > but it was suggested to not make too many disruptive > changes > > > there > > > > > > > > > > * we discussed the possibility of adding a config file > for > > > > > > > > > bin/accumulo-cluster (also mentioned on > > > > > > > > > > https://github.com/apache/accumulo/pull/1568) > > > > > > > > > > * we discussed the need to document the > intent/purpose/scope > > > of > > > > > the > > > > > > > > > scripts in comments inside the scripts themselves > > > > > > > > > > * Ed Coleman asked if it'd be good to document a systemd > > > > > example; I > > > > > > > > > suggested it might make for a good blog post (perhaps by > the > > > person > > > > > > who > > > > > > > > > wrote the systemd unit files for Fluo Muchos) > > > > > > > > > > > > > > > > > > > > Keith Turner discussed his development efforts with > regard to > > > > > > enabling > > > > > > > > > more controls over compactions. > > > > > > > > > > > > > > > > > > > > * one main idea was to keep configuration/API for data > > > separate > > > > > > from > > > > > > > > > that for execution > > > > > > > > > > * data is concerns to application owners, whereas > execution > > > > > > involves > > > > > > > > > system admins (resource contention, etc.) > > > > > > > > > > * he will submit a PR for review when ready > > > > > > > > > > * he also suggested another call to go over the PR > > > > > > > > > > > > > > > > > > > > Billie Rinaldi discussed better support for Azure Data > Lake > > > > > Storage > > > > > > > > > > Gen2 (ADLSv2). > > > > > > > > > > > > > > > > > > > > * maintaining a fork for experimenting, and working on > > > reliably > > > > > > testing > > > > > > > > > issues involving WALs > > > > > > > > > > * did not recommend using ADLSv2 with WALs, but that we > > > should > > > > > > still > > > > > > > > > support it > > > > > > > > > > * might need to implement a custom log closer to better > > > support > > > > > it > > > > > > > > > > > > > > > > > > > > Mike Miller brought up the idea of eliminating more > static > > > > > internal > > > > > > > > > state. > > > > > > > > > > > > > > > > > > > > * ServerConfigurationFactory might be improved in this > > > regard, > > > > > with > > > > > > > > some > > > > > > > > > additional ZK cleanup > > > > > > > > > > * Other ZK cleanup might help elsewhere (such as > ZooCache) > > > > > > > > > > * I suggested tablet location cache might also benefit > from > > > being > > > > > > bound > > > > > > > > > to an AccumuloClient lifecycle (or a dedicated opaque > object > > > that > > > > > > could > > > > > > > > be > > > > > > > > > shared across AccumuloClient instances with its own > > > user-managed > > > > > > > > lifecycle) > > > > > > > > > > > > > > > > > > > > Please add anything I might have missed (or got wrong) in > > > > > response > > > > > > to > > > > > > > > > this post. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >