Re: [REPORT][DRAFT] Apache Accumulo April 2020
Maybe clarify on the "issues" section that a resolution in sight but it's not done yet. You imply this in other words, but being clear that the trademark issue is "acknowledged by the owner and the PMC is waiting on a fix by that owner" is helpful for someone who is moving through the report quickly. On 4/6/20 1:43 PM, Michael Wall wrote: The Apache Accumulo PMC decided to draft its quarterly board reports on the dev list. Here is a draft of our report which is due Wednesday, Apr 8, 1 week before the board meeting on Wednesday, Apr 15. Please let me know if you have any feedback. I'll post it on Wed. Some more detailed metrics are at https://reporter.apache.org/wizard/statistics?accumulo, which appears to require a committer login. Mike [REPORT] Accumulo - Jan 2020 ## Description: The Apache Accumulo sorted, distributed key/value store is a robust, scalable, high performance data storage system that features cell-based access control and customizable server-side processing. It is based on Google's BigTable design and is built on top of Apache Hadoop, Zookeeper, and Thrift. ## Issues: The Oct report listed a discussion about a trademark issue at http://www.accumulodata.com [1]. The owner is still looking for the right account used for that site to repoint to https://accumulo.apache.org. ## Membership Data: Apache Accumulo was founded 2012-03-20 (8 years ago) There are currently 36 committers and 36 PMC members in this project. The Committer-to-PMC ratio is 1:1. Community changes, past quarter: - No new PMC members. Last addition was Holly Keebler on 2019-08-08. - No new committers. Last addition was Holly Keebler on 2019-08-09. - Ongoing discussion about adding a new member. ## Project Activity: - No new releases this quarter, although 1.10 is still in the works. The 1.10 releases will also be our first LTS release [2]. - There was work done to improve the website generation using github actions [3]. - The monthly "Hack Day" continues in Columbia MD. There was nothing of note posted from the meetings in Jan[4] and Feb[5]. The March Hack Day had a few notes[6]. - Some of the developers participated in a slack call[7] on Mar 24, notes were written to the mailing list[8]. ## Community Health: - Activity in the community is consistent. There is less activity on the mailing lists but more on github issues and and PRs [9]. [1]: https://lists.apache.org/thread.html/514d3cf9162e72f4aa13be1db5d6685999fc83755695308a529de4d6@%3Cprivate.accumulo.apache.org [2]: https://lists.apache.org/thread.html/43f051404bc5f15cde8f971ccbdc4cf7b017cc014affd914c357eaad%40%3Cdev.accumulo.apache.org%3E [3]: https://lists.apache.org/thread.html/rc9dacacb7bafd1d2289cdfa67ab31d5f4c0c1c47eb1afc905d62ef77%40%3Cdev.accumulo.apache.org%3E [4]: https://lists.apache.org/thread.html/r873b186740d0c1c078edafbf0af4fab0158f85aabc74348cfdf8acc8%40%3Cdev.accumulo.apache.org%3E [5]: https://lists.apache.org/thread.html/r0c43fdc622d446a0f5cbec79085de86e8ad098a173a73739e86c98fd%40%3Cdev.accumulo.apache.org%3E [6]: https://lists.apache.org/thread.html/r3753f5ee8caba67fc00a4a6af36c75018349085f9c5fd7892ba7d7aa%40%3Cdev.accumulo.apache.org%3E [7]: https://lists.apache.org/thread.html/r494ba26ee4e8f16fc1b865bb363f3e4a9035738d8c49f10505d6e4f5%40%3Cdev.accumulo.apache.org%3E [8]: https://lists.apache.org/thread.html/r2ae8f3375fc2c2e36b11e576456b8697f29057c06d0bf89c6e165d14%40%3Cdev.accumulo.apache.org%3E [9]:https://reporter.apache.org/wizard/statistics?accumulo
Re: accumulo trace from monitor
Hadoop provided the CredentialsProviders API as a way to obfuscate passwords from being stored in plaintext on your filesystem. This is just done via an JCEKS file located somewhere on the local filesystem or HDFS. If I was to take a wag, I'd assume that the trace token (username/principal and password) are stored using a CredentialProvider, probably Cloudera Manager managing that file for you. The error you give almost looks like Accumulo can't find/extract the password from the JCEKS file. However, if you have changed the password, I'm not sure how CM would know that you have changed the password "underneath" it (I can't imagine it could know this). Regardless, I'm sure there is a way to fix the JCEKS file so that it has the new password. Reaching out to a Cloudera rep (services or support) would be a good idea. On Fri, Mar 13, 2020 at 1:08 PM marathiboy wrote: > > Thanks, > > > I am using accumulo 1.9.2-cdh6.1.0 (installed using cloudera parcel) > > > as far as I know I didn't add anything related to credential provider and > = > when I searched for credential, I don't get any results back. > > > Thanks > > > S > > > > -- > Sent from: http://apache-accumulo.1065345.n5.nabble.com/Developers-f3.html
Re: Replication-related IT failures
I'm really upset that you think suggesting removal of the feature is appropriate. More installations than not of HBase (IMO which should be considered Accumulo's biggest competitor) use replication. The only users of HBase I see who without a disaster recovery plan is developer-focused instances with zero uptime guarantees. I'll even go farther to say: any user who deploys a database into a production scenario would *require* a D/R solution for that database before it would be allowed to be called "production". Yes, there are D/R solutions that can be implemented at the data processing layer, but this is almost always less ideal as the cost of reprocessing and shipping the raw data is much greater than what Accumulo replication could do. While I am deflated that no other developers have seen this and have any interest in helping work through bugs/issues, they are volunteers and I can only be sad about this. However, I will not let an argument which equates to "we should junk the car because it has a flat tire" go without response. On 1/28/20 10:58 PM, Christopher wrote: As succinctly as I can: 1. Replication-related IT have been flakey for a long time, 2. The feature is not actively maintained (critical, or at least, untriaged issues exist dating back to 2014 in JIRA), 3. No volunteers have stepped up thus far to maintain them and make them reliable or to develop/maintain replication, 4. I don't have time to fix the flakey ITs, and don't have interest or use case for maintaining the feature, 5. The IT breakages interfere with build testing on CI servers and for releases. Therefore: A. I want to @Ignore the flakey ITs, so they don't keep interfering with test builds, B. We can re-enable the ITs if/when a volunteer contributes reliability fixes for them, C. If nobody steps up, we should have a separate conversation about possibly phasing out the feature and what that would look like. The conversation I suggest in "C" is a bit premature right now. I'm starting with this email to see if any volunteers want to step up. Even if somebody steps up immediately, they may not have a fix immediately. So, if there's no objections, I'm going to disable the flakey tests soon by adding the '@Ignore' JUnit annotation until a fix is contributed, so they don't keep getting in the way of troubleshooting other build-related issues. We already know they are flakey... the constant failures aren't telling us anything new, so the tests aren't useful as is.
Re: [LAZY][VOTE] A basic, but concrete, LTS proposal
Seems fine to me. Any expectations on how upgrades work within an LTS release? How about across LTS releases? Some specific situations to mull over: * Can rolling upgrade in an LTS release (to new patch version) with no downtime. (e.g. 1.9.1 to 1.9.3) * Can any LTS release (1.9.1) be guaranteed to upgrade to a later LTS release (2.3.1)? * What about rolling back in an LTS release (e.g. 2.3.2 back to 2.3.1 after some bug is found) Not looking for immediate answers, but it would be good to define the expectations you have around what we want Accumulo to be able to do (ignoring the fact that bugs will certainly arise around upgrades/downgrades). On 10/30/19 9:00 PM, Christopher wrote: Following up from the discussion at https://lists.apache.org/thread.html/560bfe8d911be5b829e6250a34dfa1ace0584b24251651be1c77d724@%3Cdev.accumulo.apache.org%3E I think we should adopt this LTS concept: LTS releases: * Designate a new LTS line every 2 years (designation communicates intent to support/patch) * Target patch releases to LTS lines for 3 years * EOL previous LTS line when the new one has been available for 1 year non-LTS releases: * Periodic releases that aren't expected to be supported with patch releases * Can still create patch releases, but only until the next LTS/non-LTS release line (typically only for critical bugs because we won't keep a maintenance branch around for non-LTS... instead, we'll roll bugfixes into the next release, or branch off the tag for a critical bug) * non-LTS releases are EOL as soon as the next LTS/non-LTS release line is created Transition plan: * Define LTS on the downloads page of the website * Designate 1.9 as first (and currently only) LTS release line * Mark the LTS expected EOL date on the downloads page next to the LTS releases (to the month... we don't need to get too granular/pedantic) What this proposal does *not* do is determine how frequently we release. It *only* determines which versions we will designate as LTS. So, this doesn't bind us to any fixed release schedule, and we can release as frequently (or infrequently) as our community wishes (though I hope the non-LTS releases will occur more frequently, as they can take more creative risks). But, the main point of this proposal is that every two years, we'll designate a new release that will take over as our main "supported line" that will be low-risk, and more stable over time. The 1-year overlap for people to upgrade from one LTS to the next in this plan is pretty useful, too, I think. Here's an example set of hypothetical releases (except 1.9.x and 2.0.0, which are real) under this plan: * LTS (2018): 1.9.0 -> 1.9.1 -> 1.9.2 -> ... -> EOL(2021) * non-LTS (2018-2020): 2.0.0 -> 2.1.0 -> 2.1.1 (critical bug fix) -> 2.2.0 * LTS (2020): 2.3.0 -> 2.3.1 -> 2.3.2 -> ... -> EOL(2023) * non-LTS (2020-2022): 2.4.0 -> 2.5.0 -> 3.0.0 * LTS (2022): 3.1.0 -> 3.1.1 -> 3.1.2 -> ... -> EOL(2025) This LTS proposal isn't perfect and doesn't solve all possible issues, but I think it establishes the groundwork for future release plans/schedules and helps frame discussions about future releases, that we can work through later if needed. If there's general consensus on the basic proposal here, I can start updating the website after 72 hours (lazy consensus) to add the LTS definition and mark things on the downloads page, accordingly. If it turns into a significant discussion, I'll hold off on anything until the discussion points are resolved. If there's disagreement that can't be resolved, I'll start a more formal vote later (or give up due to lost motivation, worst case :smile:). -- Christopher
WALs and HDFS (was Re: Accumulo on Azure - Long Term Monitoring)
Forking this off because I don't think it's related to Tushar's original question. HBase and Accumulo both implementation a WAL which can be said to relying on a distributed FileSystem which: 1. Is API compatible with HDFS 2. Guarantees that data written prior to an hflush/hsync() is durable There are actually a few filesystems capable of this: HDFS (duh), Azure's Windows Azure Storage Blob (WASB), Azure's Data Lake Store (ADLS), and Azure's Blob Filesystem (ABFS). Azure has had a pretty long interaction with the upstream Hadoop project (and some ties in with the HBase project) to make sure that we know how to configure their Hadoop drivers that work with those Azure blob stores to make that durability guarantee. That said, it's wrong to say that HBase/Accumulo in a cloud solution require HDFS. It is accurate to say that S3 (via the S3A adapter) does not provide the durability guarantees that HBase/Accumulo need for WALs (but EMRFS does, from what I've heard through the grapevine, but requires you to be using EMR) On 10/25/19 1:49 PM, David Mollitor wrote: Hello Team, One short coming of Apache Accumulo and Apache HBase, as I understand it, is that they both rely on the HDFS for replicated WAL management. Therefore, HDFS is a requirement even if deploying to a cloud solution. I believe Google has developed a consensus enabled WAL management so that three instances can be stood up without any external dependencies (other than storage for the collection of rfile/hfile). Be interested to hear your thoughts on this. On Fri, Oct 25, 2019 at 1:46 PM Mike Miller wrote: Hi Tushar, The closest thing we have are the performance tests in accumulo-testing, which is probably the best place. https://github.com/apache/accumulo-testing#performance-test The instructions for setting up the scripts are in the README. There are only a limited number of tests written though and they used to be integration tests that were moved out of the main test package. org.apache.accumulo.testing.performance.tests.DurabilityWriteSpeedPT org.apache.accumulo.testing.performance.tests.YieldingScanExecutorPT org.apache.accumulo.testing.performance.tests.ScanExecutorPT org.apache.accumulo.testing.performance.tests.ScanFewFamiliesPT org.apache.accumulo.testing.performance.tests.ConditionalMutationsPT org.apache.accumulo.testing.performance.tests.RandomCachedLookupsPT On Thu, Oct 24, 2019 at 8:09 PM Tushar Dhadiwal wrote: Hello Everyone, I am a Software Engineer at Microsoft and our team is currently working on making the deployment and operations of Accumulo on Azure as seamless as possible. As part of this effort, we are attempting to observe / measure some standard Accumulo operations (e.g. scan, canary queries, ingest, etc.) and how their performance varies over time on long standing Accumulo clusters running in Azure. As part of this we’re looking to come up with a metric that we can use to evaluate how healthy / available an Accumulo cluster is. Over time we intend to use this to understand how underlying platform changes in Azure can affect overall health of Accumulo workloads. As a starting metric for example, we are thinking of continually doing scans of random values across various tablet servers and capturing timing information related to how long such scans take. I took a quick look at the accumulo-testing repo and didn’t find any tests or probes attempting to do something along these lines. Does something like this seem reasonable? Has anyone previously attempted something similar? Does accumulo-testing seem like a reasonable place for code that attempts to do something like this? Appreciate your thoughts and feedback. Cheers, Tushar Dhadiwal
Re: Help with PR 1333
Also, just in case you're feeling this way, any kind of contribution that you want to put together is helpful, welcome and appreciated. Please don't feel like you're unable to contribute because you can't get something "substantial" together. Sometimes it's the smallest or "silliest" changes that can make the biggest impact. On 10/16/19 9:37 AM, David Mollitor wrote: Hello Gang, I work with a customer that uses Accumulo. My full-time position is not in development, so while I'm curious to look into Accumulo a bit, I can't make substantial contributions at this time. However, I do enjoy working on things that I like to call "below the waterline." Reviewing code, documentation, and performing small clean-up tasks when and where I can. With that said, I starting looking at cleaning up code in the LRUCache. However, it lead me down a bit of a rabbit hole and I discovered that the LRU cache is deleting more data than it needs to be. I've addressed that issue in the same PR. Is someone able to assist me in review and submission? https://github.com/apache/accumulo/pull/1333 Thanks!
Re: rc2 test question
Yeah, this has been sporadically failing since at least 1.7 days. On 7/30/19 1:37 PM, Owens, Mark wrote: Yep, after several continual failures it then started passing. -Original Message- From: Adam Lerman Sent: Tuesday, July 30, 2019 1:26 PM To: dev@accumulo.apache.org Subject: Re: rc2 test question Mark, I have seen that test class fail often both on personal machines and AWS instances. I can't quantify as I haven't kept track of when it has happened, but I know it's that class. Here is a link to some discussion that happened around that class on slack https://the-asf.slack.com/archives/CERNB8NDC/p1560797596014800?thread_ts=1560797596.014800=CERNB8NDC
Re: Accumulo Website question - I will add these steps to the README for Ubuntu/Pop users.
Would be better to add: ``` $ gem install bundler $ bundler install # should automatically install Jekyll for you ``` Using gem to install Jekyll installs it "globally" instead of local to your "bundle" (the accumulo website). This increases the likelihood that you have some version clash of Ruby dependencies on your local machine. You would then run Jekyll via `bundle exec jekyll <..>` instead of just `jekyll`. I don't see a reason why you would create a `jekyll` user. Suggest you drop that unless you have a reason.. Thanks for updating documentation around this. On 6/7/19 2:14 PM, Jeffrey Zeiberg wrote: Step 1: Installing Ruby First, log into your server, then execute these commands: sudo apt-get update sudo apt-get install ruby-full make gcc nodejs build-essential patch Step 2: Setting up Jekyll This part is quite easy. Simply execute the following to install Jekyll and its dependencies using Gem: gem install jekyll bundler Now, create a user for it: useradd jekyll
Re: NoSQL day summary
(Ensuring it goes out to all lists, thanks Artem) Also thank you to CCRi! Missed them in the original message as a sponsor. On 5/24/19 4:24 PM, Josh Elser wrote: (pardon the cross-post -- please reply-list unless there's a good reason to cross-post some more) Hi, While NoSQL day is fresh in my head, I wanted to share some general information about the event this past Tuesday. We got started around 9:30 AM in D.C., yours truly welcoming everyone, followed by a fellow from Intel talking about some hardware they have coming and the work that Ram and Anoop have been doing about leveraging it in HBase (sadly, we didn't have them in person!). Two gents from Microsoft Azure got on stage to talk about Azure and the HBase and Phoenix support on HDInsight. From there, we broke into two rooms, each of which held seven talks. We had lots of familiar faces, but also had some new faces (even for me!). After 5pm, we broke out some drinks and snacks and had a candid Q/Panel session with a spattering of folks from each community. The audience gave us some questions to ask them, but I also tried to interject a few doozies to make them sweat. All said and done, we had about 170 individuals registered, about 140 folks showed up, and we had roughly 110 of them remaining by the end of the day. We were quite happy with these numbers as the usual percentages for attendees to registrants is 20-30% lower than this. Talks were recorded with their slide presentation. Editing/processing on these will take some time -- I'd expect a month before I'm able to get these posted on YouTube for everyone (but rest assured that it will happen). All attendees should be receiving a survey to give us feedback about the event, but I'd also encourage anyone else to send me feedback directly that doesn't want to use the form. The hope is that we can keep this tradition going next year, but it's always a struggle. I can say that we could not have done this without the sponsorship of Bloomberg, Intel, Microsoft, Salesforce (and, of course, Cloudera). Thank you all very much. - Josh
NoSQL day summary
(pardon the cross-post -- please reply-list unless there's a good reason to cross-post some more) Hi, While NoSQL day is fresh in my head, I wanted to share some general information about the event this past Tuesday. We got started around 9:30 AM in D.C., yours truly welcoming everyone, followed by a fellow from Intel talking about some hardware they have coming and the work that Ram and Anoop have been doing about leveraging it in HBase (sadly, we didn't have them in person!). Two gents from Microsoft Azure got on stage to talk about Azure and the HBase and Phoenix support on HDInsight. From there, we broke into two rooms, each of which held seven talks. We had lots of familiar faces, but also had some new faces (even for me!). After 5pm, we broke out some drinks and snacks and had a candid Q/Panel session with a spattering of folks from each community. The audience gave us some questions to ask them, but I also tried to interject a few doozies to make them sweat. All said and done, we had about 170 individuals registered, about 140 folks showed up, and we had roughly 110 of them remaining by the end of the day. We were quite happy with these numbers as the usual percentages for attendees to registrants is 20-30% lower than this. Talks were recorded with their slide presentation. Editing/processing on these will take some time -- I'd expect a month before I'm able to get these posted on YouTube for everyone (but rest assured that it will happen). All attendees should be receiving a survey to give us feedback about the event, but I'd also encourage anyone else to send me feedback directly that doesn't want to use the form. The hope is that we can keep this tradition going next year, but it's always a struggle. I can say that we could not have done this without the sponsorship of Bloomberg, Intel, Microsoft, Salesforce (and, of course, Cloudera). Thank you all very much. - Josh
Talk submissions for NoSQL day?
Coming back from vacation, I don't see any submissions from the pool of developer's I'd normally expect to see. I was hoping that since this was in the "backyard" for most folks that we'd have enough talks to fill a room for a day. Abstracts were set to close last Friday, but, best as I can see, they are still open now. If you were planning on submitting something, please do so ASAP. https://dataworkssummit.com/nosql-day-2019/ At present, we barely have enough submissions to have a half-day of Accumulo content (if all submissions are accepted). - Josh
Re: [DRAFT] [REPORT] Apache Accumulo - April 2019
Some general comments that I suspect the board will ask on their own. If you want to proactively address them, it might keep away some back-and-forth, but it's up to you. * Any prospects on new C/PMC? * An acknowledgement that no decisions are being made at the Hack Day and there will be some summary of relevant discussions on the dev list for those who were not physically present. Looks like you covered all of the good stuff. On 4/9/19 9:58 AM, Michael Wall wrote: The Apache Accumulo PMC decided to draft our quarterly board reports on the dev list. Here is a draft of our report which is due by Wednesday, Apr 10, 1 week before the board meeting on Wednesday, Apr 17. Please let me know if you have any feedback. I will submit this tomorrow morning. Sorry for the short timeline. Mike ## Description: - The Apache Accumulo sorted, distributed key/value store is a robust, scalable, high performance data storage system that features cell-based access control and customizable server-side processing. It is based on Google's BigTable design and is built on top of Apache Hadoop, Zookeeper, and Thrift. ## Issues: - There are no issues requiring board attention at this time. ## Activity: - There was 1 new release, Accumulo-2.0.0-alpha-2 since the last report [1]. Progress toward the 2.0.0 release continues. - A 1.9.3 is in the works with a release soon [2]. - The community once again started a monthly "Hack Day" in Columbia MD that is open to all contributors [3]. ## Health report: - The project remains healthy. Activity levels on mailing lists, issues and pull requests remain constant. ## PMC changes: - Currently 34 PMC members. - No new PMC members added in the last 3 months - Last PMC addition was Nick Felts on Thu Mar 22 2018 ## Committer base changes: - All Committers are also PMC members, see the PMC Changes section for details ## Releases: - accumulo-2.0.0-alpha-2 was released on Wed Jan 30 2019 ## Mailing list activity: - Nothing significant in the figures ## Issue activity: - 78 issues created [4] and 55 closed [5] across all the Accumulo repos since the last report. - 153 pull requests created [6] and 152 closed [7] across all the Accumulo repos since the last report. [1]: https://accumulo.apache.org/release/accumulo-2.0.0-alpha-2/ [2]: https://accumulo.apache.org/release/accumulo-1.9.3/ [3]: https://lists.apache.org/thread.html/9817962004326e233b8360f945420a3ffed4526f181098aaf4b76e66@%3Cdev.accumulo.apache.org%3E [4]: https://github.com/search?q=is:issue+created:2019-01-17..2019-04-17+repo:apache/accumulo+repo:apache/accumulo-website+repo:apache/accumulo-examples+repo:apache/accumulo-docker+repo:apache/accumulo-testing+repo:apache/accumulo-wikisearch+repo:apache/accumulo-proxy+repo:apache/accumulo-maven-plugin+repo:apache/accumulo-pig+repo:apache/accumulo-instamo-archetype+repo:apache/accumulo-bsp [5]: https://github.com/search?q=is:issue+closed:2019-01-17..2019-04-17+repo:apache/accumulo+repo:apache/accumulo-website+repo:apache/accumulo-examples+repo:apache/accumulo-docker+repo:apache/accumulo-testing+repo:apache/accumulo-wikisearch+repo:apache/accumulo-proxy+repo:apache/accumulo-maven-plugin+repo:apache/accumulo-pig+repo:apache/accumulo-instamo-archetype+repo:apache/accumulo-bsp [6]: https://github.com/search?q=is:pr+created:2019-01-17..2019-04-17+repo:apache/accumulo+repo:apache/accumulo-website+repo:apache/accumulo-examples+repo:apache/accumulo-docker+repo:apache/accumulo-testing+repo:apache/accumulo-wikisearch+repo:apache/accumulo-proxy+repo:apache/accumulo-maven-plugin+repo:apache/accumulo-pig+repo:apache/accumulo-instamo-archetype+repo:apache/accumulo-bsp [7]: https://github.com/search?q=is:pr+closed:2019-01-17..2019-04-17+repo:apache/accumulo+repo:apache/accumulo-website+repo:apache/accumulo-examples+repo:apache/accumulo-docker+repo:apache/accumulo-testing+repo:apache/accumulo-wikisearch+repo:apache/accumulo-proxy+repo:apache/accumulo-maven-plugin+repo:apache/accumulo-pig+repo:apache/accumulo-instamo-archetype+repo:apache/accumulo-bsp
Re: [VOTE] Apache Accumulo 1.9.3-rc2
Again, like I included earlier: > (Append ".sha1", ".md5", or ".asc" to download the signature/hash for a given artifact.) On 4/1/19 1:56 PM, Christopher wrote: In what way? On Mon, Apr 1, 2019 at 1:54 PM Josh Elser wrote: Your email template is wrong. On 4/1/19 1:33 PM, Christopher wrote: Sorry, I don't understand what you mean by 'retelling of "checksums of old"'. On Mon, Apr 1, 2019 at 12:30 PM Josh Elser wrote: I think Mike's point was your VOTE template does not reflect the retelling of "checksums of old" > (Append ".sha1", ".md5", or ".asc" to download the signature/hash for a given artifact.) On 3/31/19 10:54 PM, Christopher wrote: Mike, We already stopped using md5 and sha1 for the release artifacts on the mirrors. I did this some time ago, and we discussed it on list on previous vote threads (last year)... which resulted in me changing the release candidate build script automated tooling to embed the SHA512 sums for the tarballs directly in the release vote message. I even went back and updated the downloads page for the previous releases and updated the mirrors to be SHA512 only. Because of these steps I took, Accumulo was one of the first projects across the entire ASF who were 100% compliant immediately after INFRA VP updated the release distribution policy you linked. *This is a resolved action for Accumulo.* FWIW, SHA512 was also used as the hash algorithm in the GPG signature (same as every RC I've ever prepped for ASF). The only remaining md5 and sha1 reference are Maven-specific tooling, and we have no control over that tooling. We could change the vote template to no longer mention them, but I don't see the point since they're still relevant within the context of Maven artifact hosting, and that's the context in which they are presented in the vote email. On Sun, Mar 31, 2019 at 1:59 PM Michael Wall wrote: -1 for the issue with commons config I check the signatures, they are good. We should stop using md5 and sha1 though, see https://www.apache.org/dev/release-distribution#sigs-and-sums. Has anyone looked at moving to sha256 and/org sha512? Successful run of mvn clean verify -Psunny On Sat, Mar 30, 2019 at 11:31 PM Keith Turner wrote: I completed a continuous ingest run on a 10 node cluster running Centos 7. I used the native map. I had to rebuild Accumulo to work around #1065 inorder to get the verify M/R job to run. org.apache.accumulo.test.continuous.ContinuousVerify$Counts REFERENCED=34417110819 UNREFERENCED=9097524 On Wed, Mar 27, 2019 at 7:57 PM Christopher wrote: Accumulo Developers, Please consider the following candidate for Apache Accumulo 1.9.3. This supersedes RC1 and contains the following change: https://github.com/apache/accumulo/pull/1057 Git Commit: 94f9782242a1f336e176c282f0f90063a21e361d Branch: 1.9.3-rc2 If this vote passes, a gpg-signed tag will be created using: git tag -f -m 'Apache Accumulo 1.9.3' -s rel/1.9.3 \ 94f9782242a1f336e176c282f0f90063a21e361d Staging repo: https://repository.apache.org/content/repositories/orgapacheaccumulo-1077 Source (official release artifact): https://repository.apache.org/content/repositories/orgapacheaccumulo-1077/org/apache/accumulo/accumulo/1.9.3/accumulo-1.9.3-src.tar.gz Binary: https://repository.apache.org/content/repositories/orgapacheaccumulo-1077/org/apache/accumulo/accumulo/1.9.3/accumulo-1.9.3-bin.tar.gz (Append ".sha1", ".md5", or ".asc" to download the signature/hash for a given artifact.) In addition to the tarballs, and their signatures, the following checksum files will be added to the dist/release SVN area after release: accumulo-1.9.3-src.tar.gz.sha512 will contain: SHA512 (accumulo-1.9.3-src.tar.gz) = b366b89295b1835038cb242f8ad46b1d8455753a987333f0e15e3d89749540f2cd59db1bc6cf7100fc9050d3d0bc7340a3b661381549d40f2f0223d4120fd809 accumulo-1.9.3-bin.tar.gz.sha512 will contain: SHA512 (accumulo-1.9.3-bin.tar.gz) = cc909296d9bbd12e08064fccaf21e81b754c183a8264dfa2575762c76705fd0c580b50c2b224c60feaeec120bd618fba4d6176d0f53e96e1ca9da0d9e2556f1f Signing keys are available at https://www.apache.org/dist/accumulo/KEYS (Expected fingerprint: 8CC4F8A2B29C2B040F2B835D6F0CDAE700B6899D) Release notes (in progress) can be found at: https://accumulo.apache.org/release/accumulo-1.9.3/ Release testing instructions: https://accumulo.apache.org/contributor/verifying-release Please vote one of: [ ] +1 - I have verified and accept... [ ] +0 - I have reservations, but not strong enough to vote against... [ ] -1 - Because..., I do not accept... ... these artifacts as the 1.9.3 release of Apache Accumulo. This vote will remain open until at least Sun Mar 31 00:00:00 UTC 2019. (Sat Mar 30 20:00:00 EDT 2019 / Sat Mar 30 17:00:00 PDT 2019) Voting can continue after this deadline until the r
Re: [VOTE] Apache Accumulo 1.9.3-rc2
Your email template is wrong. On 4/1/19 1:33 PM, Christopher wrote: Sorry, I don't understand what you mean by 'retelling of "checksums of old"'. On Mon, Apr 1, 2019 at 12:30 PM Josh Elser wrote: I think Mike's point was your VOTE template does not reflect the retelling of "checksums of old" > (Append ".sha1", ".md5", or ".asc" to download the signature/hash for a given artifact.) On 3/31/19 10:54 PM, Christopher wrote: Mike, We already stopped using md5 and sha1 for the release artifacts on the mirrors. I did this some time ago, and we discussed it on list on previous vote threads (last year)... which resulted in me changing the release candidate build script automated tooling to embed the SHA512 sums for the tarballs directly in the release vote message. I even went back and updated the downloads page for the previous releases and updated the mirrors to be SHA512 only. Because of these steps I took, Accumulo was one of the first projects across the entire ASF who were 100% compliant immediately after INFRA VP updated the release distribution policy you linked. *This is a resolved action for Accumulo.* FWIW, SHA512 was also used as the hash algorithm in the GPG signature (same as every RC I've ever prepped for ASF). The only remaining md5 and sha1 reference are Maven-specific tooling, and we have no control over that tooling. We could change the vote template to no longer mention them, but I don't see the point since they're still relevant within the context of Maven artifact hosting, and that's the context in which they are presented in the vote email. On Sun, Mar 31, 2019 at 1:59 PM Michael Wall wrote: -1 for the issue with commons config I check the signatures, they are good. We should stop using md5 and sha1 though, see https://www.apache.org/dev/release-distribution#sigs-and-sums. Has anyone looked at moving to sha256 and/org sha512? Successful run of mvn clean verify -Psunny On Sat, Mar 30, 2019 at 11:31 PM Keith Turner wrote: I completed a continuous ingest run on a 10 node cluster running Centos 7. I used the native map. I had to rebuild Accumulo to work around #1065 inorder to get the verify M/R job to run. org.apache.accumulo.test.continuous.ContinuousVerify$Counts REFERENCED=34417110819 UNREFERENCED=9097524 On Wed, Mar 27, 2019 at 7:57 PM Christopher wrote: Accumulo Developers, Please consider the following candidate for Apache Accumulo 1.9.3. This supersedes RC1 and contains the following change: https://github.com/apache/accumulo/pull/1057 Git Commit: 94f9782242a1f336e176c282f0f90063a21e361d Branch: 1.9.3-rc2 If this vote passes, a gpg-signed tag will be created using: git tag -f -m 'Apache Accumulo 1.9.3' -s rel/1.9.3 \ 94f9782242a1f336e176c282f0f90063a21e361d Staging repo: https://repository.apache.org/content/repositories/orgapacheaccumulo-1077 Source (official release artifact): https://repository.apache.org/content/repositories/orgapacheaccumulo-1077/org/apache/accumulo/accumulo/1.9.3/accumulo-1.9.3-src.tar.gz Binary: https://repository.apache.org/content/repositories/orgapacheaccumulo-1077/org/apache/accumulo/accumulo/1.9.3/accumulo-1.9.3-bin.tar.gz (Append ".sha1", ".md5", or ".asc" to download the signature/hash for a given artifact.) In addition to the tarballs, and their signatures, the following checksum files will be added to the dist/release SVN area after release: accumulo-1.9.3-src.tar.gz.sha512 will contain: SHA512 (accumulo-1.9.3-src.tar.gz) = b366b89295b1835038cb242f8ad46b1d8455753a987333f0e15e3d89749540f2cd59db1bc6cf7100fc9050d3d0bc7340a3b661381549d40f2f0223d4120fd809 accumulo-1.9.3-bin.tar.gz.sha512 will contain: SHA512 (accumulo-1.9.3-bin.tar.gz) = cc909296d9bbd12e08064fccaf21e81b754c183a8264dfa2575762c76705fd0c580b50c2b224c60feaeec120bd618fba4d6176d0f53e96e1ca9da0d9e2556f1f Signing keys are available at https://www.apache.org/dist/accumulo/KEYS (Expected fingerprint: 8CC4F8A2B29C2B040F2B835D6F0CDAE700B6899D) Release notes (in progress) can be found at: https://accumulo.apache.org/release/accumulo-1.9.3/ Release testing instructions: https://accumulo.apache.org/contributor/verifying-release Please vote one of: [ ] +1 - I have verified and accept... [ ] +0 - I have reservations, but not strong enough to vote against... [ ] -1 - Because..., I do not accept... ... these artifacts as the 1.9.3 release of Apache Accumulo. This vote will remain open until at least Sun Mar 31 00:00:00 UTC 2019. (Sat Mar 30 20:00:00 EDT 2019 / Sat Mar 30 17:00:00 PDT 2019) Voting can continue after this deadline until the release manager sends an email ending the vote. Thanks! P.S. Hint: download the whole staging repo with wget -erobots=off -r -l inf -np -nH \ https://repository.apache.org/content/repositories/orgapacheaccumulo-1077/ # note the trailing slash is needed
Re: [VOTE] Apache Accumulo 1.9.3-rc2
I think Mike's point was your VOTE template does not reflect the retelling of "checksums of old" > (Append ".sha1", ".md5", or ".asc" to download the signature/hash for a given artifact.) On 3/31/19 10:54 PM, Christopher wrote: Mike, We already stopped using md5 and sha1 for the release artifacts on the mirrors. I did this some time ago, and we discussed it on list on previous vote threads (last year)... which resulted in me changing the release candidate build script automated tooling to embed the SHA512 sums for the tarballs directly in the release vote message. I even went back and updated the downloads page for the previous releases and updated the mirrors to be SHA512 only. Because of these steps I took, Accumulo was one of the first projects across the entire ASF who were 100% compliant immediately after INFRA VP updated the release distribution policy you linked. *This is a resolved action for Accumulo.* FWIW, SHA512 was also used as the hash algorithm in the GPG signature (same as every RC I've ever prepped for ASF). The only remaining md5 and sha1 reference are Maven-specific tooling, and we have no control over that tooling. We could change the vote template to no longer mention them, but I don't see the point since they're still relevant within the context of Maven artifact hosting, and that's the context in which they are presented in the vote email. On Sun, Mar 31, 2019 at 1:59 PM Michael Wall wrote: -1 for the issue with commons config I check the signatures, they are good. We should stop using md5 and sha1 though, see https://www.apache.org/dev/release-distribution#sigs-and-sums. Has anyone looked at moving to sha256 and/org sha512? Successful run of mvn clean verify -Psunny On Sat, Mar 30, 2019 at 11:31 PM Keith Turner wrote: I completed a continuous ingest run on a 10 node cluster running Centos 7. I used the native map. I had to rebuild Accumulo to work around #1065 inorder to get the verify M/R job to run. org.apache.accumulo.test.continuous.ContinuousVerify$Counts REFERENCED=34417110819 UNREFERENCED=9097524 On Wed, Mar 27, 2019 at 7:57 PM Christopher wrote: Accumulo Developers, Please consider the following candidate for Apache Accumulo 1.9.3. This supersedes RC1 and contains the following change: https://github.com/apache/accumulo/pull/1057 Git Commit: 94f9782242a1f336e176c282f0f90063a21e361d Branch: 1.9.3-rc2 If this vote passes, a gpg-signed tag will be created using: git tag -f -m 'Apache Accumulo 1.9.3' -s rel/1.9.3 \ 94f9782242a1f336e176c282f0f90063a21e361d Staging repo: https://repository.apache.org/content/repositories/orgapacheaccumulo-1077 Source (official release artifact): https://repository.apache.org/content/repositories/orgapacheaccumulo-1077/org/apache/accumulo/accumulo/1.9.3/accumulo-1.9.3-src.tar.gz Binary: https://repository.apache.org/content/repositories/orgapacheaccumulo-1077/org/apache/accumulo/accumulo/1.9.3/accumulo-1.9.3-bin.tar.gz (Append ".sha1", ".md5", or ".asc" to download the signature/hash for a given artifact.) In addition to the tarballs, and their signatures, the following checksum files will be added to the dist/release SVN area after release: accumulo-1.9.3-src.tar.gz.sha512 will contain: SHA512 (accumulo-1.9.3-src.tar.gz) = b366b89295b1835038cb242f8ad46b1d8455753a987333f0e15e3d89749540f2cd59db1bc6cf7100fc9050d3d0bc7340a3b661381549d40f2f0223d4120fd809 accumulo-1.9.3-bin.tar.gz.sha512 will contain: SHA512 (accumulo-1.9.3-bin.tar.gz) = cc909296d9bbd12e08064fccaf21e81b754c183a8264dfa2575762c76705fd0c580b50c2b224c60feaeec120bd618fba4d6176d0f53e96e1ca9da0d9e2556f1f Signing keys are available at https://www.apache.org/dist/accumulo/KEYS (Expected fingerprint: 8CC4F8A2B29C2B040F2B835D6F0CDAE700B6899D) Release notes (in progress) can be found at: https://accumulo.apache.org/release/accumulo-1.9.3/ Release testing instructions: https://accumulo.apache.org/contributor/verifying-release Please vote one of: [ ] +1 - I have verified and accept... [ ] +0 - I have reservations, but not strong enough to vote against... [ ] -1 - Because..., I do not accept... ... these artifacts as the 1.9.3 release of Apache Accumulo. This vote will remain open until at least Sun Mar 31 00:00:00 UTC 2019. (Sat Mar 30 20:00:00 EDT 2019 / Sat Mar 30 17:00:00 PDT 2019) Voting can continue after this deadline until the release manager sends an email ending the vote. Thanks! P.S. Hint: download the whole staging repo with wget -erobots=off -r -l inf -np -nH \ https://repository.apache.org/content/repositories/orgapacheaccumulo-1077/ # note the trailing slash is needed
Re: Combining output of multiple filters/iterators
You cannot feasibly hold onto some intermediate batch of nodes in memory. You're invalidating the general premise of how Accumulo iterators are meant to work in doing this. Further, an Iterator can _only_ safely operate within one row of a table. Two adjacent rows may be located on two different physical machines. Would suggest you read through this presentation and try to take some time to understand why they did it this way: http://www.pdl.cmu.edu/SDI/2013/slides/big_graph_nsa_rd_2013_56002v1.pdf. You might also be able to take something from Shana Hutchison's work on Graphulo: https://arxiv.org/abs/1606.07085 On 3/29/19 2:20 PM, Enas Alkawasmi wrote: Thank you for this suggestion. i have one question, c I pass options to the new source that are from the result of the current iterator? . the new iterator need to get the parent nodes from the the current one how can enforce the iterator to wait for the result form its preceding iterator? -- Sent from: http://apache-accumulo.1065345.n5.nabble.com/Developers-f3.html
Re: Adding additional default JVM parameters to accumulo-env.sh boosts performance and prevent crashing of Tservers and masters.
Very rarely do JVM GC properties universally apply to all users and workloads. I think it would be better to document why these options helped in your workload. Teach folks how to choose the correct JVM properties for their workloads is a better way forward, than encouraging folks to treat them as black-boxes. While this is extremely in-depth, I like the tone of this blog post: https://blogs.apache.org/hbase/entry/tuning_g1gc_for_your_hbase. The authors explain what they observed from a system, what they changed, what effect that change should have, and, finally, the change that they observe. On 3/22/19 9:28 AM, Jeffrey Zeiberg wrote: Jeffrey Manno (ASRC Federal) and Jeffrey Zeiberg (ASRC Federal) have discovered that adding a few new JVM options to the JAVA_OPTS set prevented crashing and increase system performance. They were added after line 68 in accumulo-env.sh. They are: 'server' '-XX:+UseParallelOldGC' '-XX:NewRatio=3' '-XX:AggressiveHeap' The machines we are using are 7 year old machines with 8G of main memory, between 500G - 1T HDD and Intel I5 or I7 processors. Maybe these parameters should be made in the Accumulo 2.0 distributions accumulo-env.sh file? Comments?
Re: Builds
I feel like trying to put the Jenkinsfile in a separate branch might cause more headache than it's worth. Happen to stumble on AW's write-up on a similar subject? https://effectivemachines.com/2019/01/24/using-apache-yetus-with-jenkins-and-github-part-1/ On 2/5/19 2:35 PM, Michael Wall wrote: I think I see a path forward using a Jenkinsfile. Please comment on this plan if something doesn't make sense or someone does not agree. 1 - Create a new branch off master called jenkinsfile or such and add a jenkinsfile that runs only unit tests but does it in docker. 2 - Create a new job on jenkins.revelc.net that only builds that branch 3 - Iterate on the jenkinsfile until it works cleanly 4 - Reconfigure the Accumulo-Master job on builds.apache.org to use the Jenkinsfile From there it will hopefully make sense about what to do next. Maybe a Jenkinsfile-IT or something. Christopher, looks like I already have permission on jenkins.revelc.net to make jobs. Thanks Mike On Sun, Jan 6, 2019 at 10:32 PM Christopher wrote: I've seen similar problems with processes left behind on my own Jenkins instance, but my solution is to periodically log in and nuke the processes, and even occasionally reboot the instance. I don't think these options are available to us on builds.apache.org, because we don't have direct access to them. I'm also not sure why it happens or why Jenkins doesn't properly clean up child processes leftover from no-longer-running builds it launched. One suggestion was to run in Docker, but I don't know how to do that. I attempted to do it that way, and got as far as Jenkins running and connecting to the Docker instance, but the Maven versions available to configure the Maven build does not seem to match what's inside the Docker container and the job quickly fails so it's not clear how to do a Maven build in Jenkins with Docker given the options INFRA has made available to us. Perhaps if they had instructions, or an example Maven job we could model our jobs after? On Sun, Jan 6, 2019 at 9:34 PM Michael Wall wrote: Anyone look at this yet? https://lists.apache.org/thread.html/e78b1b8ccaf11eb5cb557ec29d3208c3fec0450fd2b908b3f7922c56@%3Cbuilds.apache.org%3E Not sure who even has karma to do anything here https://builds.apache.org/view/A/view/Accumulo/
Re: How to Perform an True Update of a Record?
Why are you trying to do this in the first place? When you write a new version of a cell, you are essentially replacing the old value. Leaving the old value in the table and lazily removing it (via compaction) is a core optimization to the write-path for Accumulo (from BigTable itself). I'm having a hard time understanding why you're trying to do what you're asking. On 1/23/19 1:26 PM, gtotsline wrote: Hi Mike - Thanks for responding so quickly, it's greatly appreciated. The ConditionalWriter does not appear to address our use case where we actually want to suppress Accumulo versioning of a record based on the value in the record that was read vs. input data our system receives. Is there a way to dynamically suppress Accumulo record versioning? -- Sent from: http://apache-accumulo.1065345.n5.nabble.com/Developers-f3.html
Re: 2.0.0-alpha-2
YCSB is probably the easiest thing to do workload-specific comparisons. On 1/16/19 11:19 AM, Mike Miller wrote: I think we can start doing that now with the alpha release, I am just not sure how. Did you have any ideas? On Wed, Jan 16, 2019 at 12:31 AM Sean Busbey wrote: Has anyone gotten to do a perf comparison to 1.9 yet? The time to do that would be during beta I guess? On Tue, Jan 15, 2019 at 5:18 PM Christopher wrote: I'm planning to prepare a 2.0.0-alpha-2-rc1 Thursday. So, merge your stuff if you want me to include it then. Depending on the quality of this alpha, we may want to do a beta (with an API freeze?)... or just release 2.0.0 next. I wouldn't think we'd want too many alphas. -- busbey
Re: [DRAFT] [REPORT] Apache Accumulo - Jan 2019
On 1/7/19 11:26 AM, Michael Wall wrote: Hi Josh, thanks for reviewing. The "PMC Changes" was copied directly from the reporter.apache.org template. The "Committer base changes" was in response to feedback from the board several reports ago asking if all committers were PMC members. I could say something like "All Committers are also PMC members, see the PMC Changes section for details". Is that what you mean by more explicit? Or do you mean something else? I had initially thought just copy/paste'ing the text from the PMC additions was good, but, on second thought, suggesting that readers look at the PMC additions section in the Committer additions section is just as good. Thanks for the careful attention, Mike!
Re: Slack for Accumulo
Just made one -- I only saw Mike Wall on slack so far (invited him). On 12/10/18 4:31 PM, Michael Wall wrote: Yeah, there are some Apache projects that use slack. Can you create an Accumulo channel at https://the-asf.slack.com? I don't recall what I did to set up my account there, but I did have to use my apache.org email account. The only thing I see from other communities is a small delay for users since they need to be given access to the channel. +1 to shutting down IRC. Mike On Mon, Dec 10, 2018 at 4:11 PM Christopher wrote: We did have a HipChat room, but we didn't advertise it well and nobody really used it. With all that happened with HipChat being bought by Atlassian and then sold off to Slack (or something like that), I'm not sure where ASF Infra currently stands on providing chat as an ASF resource. I have no objections to trying out Slack within the project. However, if Infra does move towards something in future (like an official ASF Slack group), we should try to follow whatever path they pave. Regardless of whether Slack pans out, we should probably shut down the IRC room... hardly anybody uses it, and the few that do are lurkers only or bots. It doesn't make sense to continue to advertise it as a way for users/devs to contact us. On Mon, Dec 10, 2018, 15:44 Mike Walch I would like to create a Slack chatroom for Accumulo and advertise it on on our 'contact us' page [1] with an invite link to make it easy to join. Is anyone opposed to this? There will be no requirement that Accumulo users or developers use Slack. We currently have an IRC chatroom but it's not used much. I think Slack will be used more as it's simpler and saves the latest history which helps if you join an ongoing discussion. If it ends up being rarely used over the next few months, I am OK with shutting it down. [1]: https://accumulo.apache.org/contact-us/
Re: commons-vfs2.jar 2.2 buggy
It seems like commons-vfs2 is just a pile of crap. It's been known to have bugs for years and we've seen zero progress from them on making them better. IMO, rip the whole damn thing out. On 10/24/18 12:42 PM, Matthew Peterson wrote: Hello Accumulo, Summary: commons-vfs2 version 2.2 seems to have problems and it may be worth rolling back to version 2.1 of commons-vfs2. My project upgraded a system from Accumulo 1.8.1 to 1.9.2. Immediately after switching vfs contexts we saw problems. The tservers would error in iterators about missing classes that were clearly on the classpath. The problems were persistent until we replaced the commons-vfs2.jar with version 2.1 (Accumulo 1.9.2 uses version 2.2). Until we rolled vfs back, we received errors particularly with Spring code trying to access various classes and files within the jars. It looks like in 2.2, commons-vfs implemented a doDetach method which closed the zip files. We suspect that code is the problem but haven't tested that theory. I suspect that most users don't use this feature. Thanks! Matt
Re: [DISCUSS] 2.0.0-alpha?
On 10/9/18 2:10 PM, Keith Turner wrote: On Tue, Oct 9, 2018 at 1:52 PM Keith Turner wrote: On Tue, Oct 9, 2018 at 12:53 PM Josh Elser wrote: On 10/9/18 12:44 PM, Keith Turner wrote: On Sat, Oct 6, 2018 at 12:27 AM Christopher wrote: Hi Accumulo devs, I'm thinking about initiating a vote next week for a 2.0.0-alpha release, so we can have an official ASF release (albeit without the usual stability expectations as a normal release) to be available for the upcoming Accumulo Summit. An alpha version would signal our progress towards 2.0.0 final, serve as a basis for testing, and give us something to share with a wider audience to solicit feedback on the API, configuration, and module changes. Of course, it would still have to meet ASF release requirements... like licensing and stuff, and it should essentially work (so people can actually run tests), but in an alpha release, we could tolerate flaws we wouldn't in a final release. Ideally, I would have preferred a 2.0.0 final at this point in the year, but I think it needs more testing. Does an alpha release next week seem reasonable to you? I am in favor of an Alpha release. Also, Alpha releases imply feature freeze in some projects. I am in favor of feature freeze. Is anyone opposed to feature freeze? Below is what feature freeze means to me. We agree to avoid adding new features for 2.0 AND work on 2.0 will focus on bug fixes and polishing features added before the Alpha. This polishing work could result in API changes. If anyone really wants to add a new feature, they should discuss it on the mailing list. No concerns with an alpha also implying a feature-freeze. That does mean that it should be even more straightforward to have a complete list of the features landing in 2.0.0 ;) (which remains my only concern) Are you concerned about not completing the release notes before an alpha vote? Or is your concern something else? Personally, I would like to see the release notes completed before 2.0.0-alpha is announced. I can't think of compelling reasons to complete it earlier than that. However, it seems critical to complete them before announcing. It's in the same line of thinking that Sean stated: > "I'd really like us to put 2.0 GA readiness in terms of feature / correctness goals rather than a strict time limit." Such a major release like 2.0 without clear reasons why users should care strikes me as very "so what?".
Re: [DISCUSS] 2.0.0-alpha?
On 10/9/18 12:44 PM, Keith Turner wrote: On Sat, Oct 6, 2018 at 12:27 AM Christopher wrote: Hi Accumulo devs, I'm thinking about initiating a vote next week for a 2.0.0-alpha release, so we can have an official ASF release (albeit without the usual stability expectations as a normal release) to be available for the upcoming Accumulo Summit. An alpha version would signal our progress towards 2.0.0 final, serve as a basis for testing, and give us something to share with a wider audience to solicit feedback on the API, configuration, and module changes. Of course, it would still have to meet ASF release requirements... like licensing and stuff, and it should essentially work (so people can actually run tests), but in an alpha release, we could tolerate flaws we wouldn't in a final release. Ideally, I would have preferred a 2.0.0 final at this point in the year, but I think it needs more testing. Does an alpha release next week seem reasonable to you? I am in favor of an Alpha release. Also, Alpha releases imply feature freeze in some projects. I am in favor of feature freeze. Is anyone opposed to feature freeze? Below is what feature freeze means to me. We agree to avoid adding new features for 2.0 AND work on 2.0 will focus on bug fixes and polishing features added before the Alpha. This polishing work could result in API changes. If anyone really wants to add a new feature, they should discuss it on the mailing list. No concerns with an alpha also implying a feature-freeze. That does mean that it should be even more straightforward to have a complete list of the features landing in 2.0.0 ;) (which remains my only concern)
Re: [DISCUSS] 2.0.0-alpha?
Ah, yes. I think you're right. Thanks again :) On 10/9/18 12:32 PM, Mike Miller wrote: Didn't RFile summaries show up in 1.9 too? (maybe I'm inventing that) I think you are thinking of Sampling, that was released in 1.8.0, showing up in 1.9. I still get them confused. They both are similar and start with S. On Tue, Oct 9, 2018 at 12:03 PM Josh Elser wrote: Thanks, Mike. Didn't RFile summaries show up in 1.9 too? (maybe I'm inventing that) On 10/9/18 11:39 AM, Mike Miller wrote: I think once we collect all the changes in 2.0 (there are a lot) we will be able to create some bullet points, picking out changes most interesting to users. The new bulk import process Kieth, Mark and I worked on should be one. There are many new features that come along with it that weren't possible. There was all the work Mike did for usability that he is presenting at the summit and wrote a blog post about 2 years ago: https://accumulo.apache.org/blog/2016/11/16/simpler-scripts-and-config.html Rfile Summaries was a big change but happened a while ago. Recently, the new Crypto service and new AccumuloClient builder are some other features that come to mind. On Mon, Oct 8, 2018 at 9:05 PM Josh Elser wrote: Frankly, planning a release without even an idea of what is going into it seems like a waste of time to me. I didn't ask these questions to try to squash such a release; I don't think they're particularly difficult to figure out. Just curious what the release notes would look like (as a user, this is what I would care about). I don't think I'm alone. On Mon, Oct 8, 2018, 19:33 Christopher wrote: I don't know the answers to these questions. I just want to put a stake in the ground before the Accumulo Summit, so we have a basis for evaluation and testing, and answering some of these unknowns. On Mon, Oct 8, 2018 at 11:28 AM Josh Elser wrote: I would like to know what the scope of 2.0 is. Specifically: * What's new in this 2.0 alpha that people that is driving the release? * Is there anything else expected to land post-alpha/pre-GA? On 10/6/18 1:36 PM, Sean Busbey wrote: yes alphas please. Do we want to talk about expectations on time between alpha releases? What kind of criteria for beta or GA? a *lot* has changed in the 2.0 codebase. On Sat, Oct 6, 2018 at 11:45 AM Ed Coleman wrote: +1 In addition to the reasons stated by Christopher, I think that it also provides a clearer signal to earlier adopters that the public API *may* change before the formal release. With a formal release candidate, I interpret that it signals that only bug-fixes would occur up and until the formal release. With the length of time that we take between minor and patch releases, the even longer time that it takes the customer base to upgrade and development cost that we have supporting multiple branches, taking some extra time now to solicit feedback seems prudent. While the specifics and implications of semver are clear, sometimes it seems that there is additional weight and additional perceived risk when changing major versions, an alpha version preserves our flexibility while still moving forward. Ed Coleman -Original Message- From: Christopher [mailto:ctubb...@apache.org] Sent: Saturday, October 06, 2018 12:28 AM To: accumulo-dev Subject: [DISCUSS] 2.0.0-alpha? Hi Accumulo devs, I'm thinking about initiating a vote next week for a 2.0.0-alpha release, so we can have an official ASF release (albeit without the usual stability expectations as a normal release) to be available for the upcoming Accumulo Summit. An alpha version would signal our progress towards 2.0.0 final, serve as a basis for testing, and give us something to share with a wider audience to solicit feedback on the API, configuration, and module changes. Of course, it would still have to meet ASF release requirements... like licensing and stuff, and it should essentially work (so people can actually run tests), but in an alpha release, we could tolerate flaws we wouldn't in a final release. Ideally, I would have preferred a 2.0.0 final at this point in the year, but I think it needs more testing. Does an alpha release next week seem reasonable to you? Christopher
Re: [DISCUSS] 2.0.0-alpha?
Thanks, Mike. Didn't RFile summaries show up in 1.9 too? (maybe I'm inventing that) On 10/9/18 11:39 AM, Mike Miller wrote: I think once we collect all the changes in 2.0 (there are a lot) we will be able to create some bullet points, picking out changes most interesting to users. The new bulk import process Kieth, Mark and I worked on should be one. There are many new features that come along with it that weren't possible. There was all the work Mike did for usability that he is presenting at the summit and wrote a blog post about 2 years ago: https://accumulo.apache.org/blog/2016/11/16/simpler-scripts-and-config.html Rfile Summaries was a big change but happened a while ago. Recently, the new Crypto service and new AccumuloClient builder are some other features that come to mind. On Mon, Oct 8, 2018 at 9:05 PM Josh Elser wrote: Frankly, planning a release without even an idea of what is going into it seems like a waste of time to me. I didn't ask these questions to try to squash such a release; I don't think they're particularly difficult to figure out. Just curious what the release notes would look like (as a user, this is what I would care about). I don't think I'm alone. On Mon, Oct 8, 2018, 19:33 Christopher wrote: I don't know the answers to these questions. I just want to put a stake in the ground before the Accumulo Summit, so we have a basis for evaluation and testing, and answering some of these unknowns. On Mon, Oct 8, 2018 at 11:28 AM Josh Elser wrote: I would like to know what the scope of 2.0 is. Specifically: * What's new in this 2.0 alpha that people that is driving the release? * Is there anything else expected to land post-alpha/pre-GA? On 10/6/18 1:36 PM, Sean Busbey wrote: yes alphas please. Do we want to talk about expectations on time between alpha releases? What kind of criteria for beta or GA? a *lot* has changed in the 2.0 codebase. On Sat, Oct 6, 2018 at 11:45 AM Ed Coleman wrote: +1 In addition to the reasons stated by Christopher, I think that it also provides a clearer signal to earlier adopters that the public API *may* change before the formal release. With a formal release candidate, I interpret that it signals that only bug-fixes would occur up and until the formal release. With the length of time that we take between minor and patch releases, the even longer time that it takes the customer base to upgrade and development cost that we have supporting multiple branches, taking some extra time now to solicit feedback seems prudent. While the specifics and implications of semver are clear, sometimes it seems that there is additional weight and additional perceived risk when changing major versions, an alpha version preserves our flexibility while still moving forward. Ed Coleman -Original Message- From: Christopher [mailto:ctubb...@apache.org] Sent: Saturday, October 06, 2018 12:28 AM To: accumulo-dev Subject: [DISCUSS] 2.0.0-alpha? Hi Accumulo devs, I'm thinking about initiating a vote next week for a 2.0.0-alpha release, so we can have an official ASF release (albeit without the usual stability expectations as a normal release) to be available for the upcoming Accumulo Summit. An alpha version would signal our progress towards 2.0.0 final, serve as a basis for testing, and give us something to share with a wider audience to solicit feedback on the API, configuration, and module changes. Of course, it would still have to meet ASF release requirements... like licensing and stuff, and it should essentially work (so people can actually run tests), but in an alpha release, we could tolerate flaws we wouldn't in a final release. Ideally, I would have preferred a 2.0.0 final at this point in the year, but I think it needs more testing. Does an alpha release next week seem reasonable to you? Christopher
Re: [DISCUSS] 2.0.0-alpha?
Frankly, planning a release without even an idea of what is going into it seems like a waste of time to me. I didn't ask these questions to try to squash such a release; I don't think they're particularly difficult to figure out. Just curious what the release notes would look like (as a user, this is what I would care about). I don't think I'm alone. On Mon, Oct 8, 2018, 19:33 Christopher wrote: > I don't know the answers to these questions. I just want to put a > stake in the ground before the Accumulo Summit, so we have a basis for > evaluation and testing, and answering some of these unknowns. > On Mon, Oct 8, 2018 at 11:28 AM Josh Elser wrote: > > > > I would like to know what the scope of 2.0 is. Specifically: > > > > * What's new in this 2.0 alpha that people that is driving the release? > > * Is there anything else expected to land post-alpha/pre-GA? > > > > On 10/6/18 1:36 PM, Sean Busbey wrote: > > > yes alphas please. Do we want to talk about expectations on time > > > between alpha releases? What kind of criteria for beta or GA? > > > > > > a *lot* has changed in the 2.0 codebase. > > > On Sat, Oct 6, 2018 at 11:45 AM Ed Coleman wrote: > > >> > > >> +1 > > >> > > >> In addition to the reasons stated by Christopher, I think that it > also provides a clearer signal to earlier adopters that the public API > *may* change before the formal release. With a formal release candidate, I > interpret that it signals that only bug-fixes would occur up and until the > formal release. > > >> > > >> With the length of time that we take between minor and patch > releases, the even longer time that it takes the customer base to upgrade > and development cost that we have supporting multiple branches, taking some > extra time now to solicit feedback seems prudent. While the specifics and > implications of semver are clear, sometimes it seems that there is > additional weight and additional perceived risk when changing major > versions, an alpha version preserves our flexibility while still moving > forward. > > >> > > >> Ed Coleman > > >> > > >> -Original Message- > > >> From: Christopher [mailto:ctubb...@apache.org] > > >> Sent: Saturday, October 06, 2018 12:28 AM > > >> To: accumulo-dev > > >> Subject: [DISCUSS] 2.0.0-alpha? > > >> > > >> Hi Accumulo devs, > > >> > > >> I'm thinking about initiating a vote next week for a 2.0.0-alpha > release, so we can have an official ASF release (albeit without the usual > stability expectations as a normal release) to be available for the > upcoming Accumulo Summit. > > >> > > >> An alpha version would signal our progress towards 2.0.0 final, serve > as a basis for testing, and give us something to share with a wider > audience to solicit feedback on the API, configuration, and module changes. > Of course, it would still have to meet ASF release requirements... like > licensing and stuff, and it should essentially work (so people can actually > run tests), but in an alpha release, we could tolerate flaws we wouldn't in > a final release. > > >> > > >> Ideally, I would have preferred a 2.0.0 final at this point in the > year, but I think it needs more testing. > > >> > > >> Does an alpha release next week seem reasonable to you? > > >> > > >> Christopher > > >> > > > > > > >
Re: [DISCUSS] 2.0.0-alpha?
I would like to know what the scope of 2.0 is. Specifically: * What's new in this 2.0 alpha that people that is driving the release? * Is there anything else expected to land post-alpha/pre-GA? On 10/6/18 1:36 PM, Sean Busbey wrote: yes alphas please. Do we want to talk about expectations on time between alpha releases? What kind of criteria for beta or GA? a *lot* has changed in the 2.0 codebase. On Sat, Oct 6, 2018 at 11:45 AM Ed Coleman wrote: +1 In addition to the reasons stated by Christopher, I think that it also provides a clearer signal to earlier adopters that the public API *may* change before the formal release. With a formal release candidate, I interpret that it signals that only bug-fixes would occur up and until the formal release. With the length of time that we take between minor and patch releases, the even longer time that it takes the customer base to upgrade and development cost that we have supporting multiple branches, taking some extra time now to solicit feedback seems prudent. While the specifics and implications of semver are clear, sometimes it seems that there is additional weight and additional perceived risk when changing major versions, an alpha version preserves our flexibility while still moving forward. Ed Coleman -Original Message- From: Christopher [mailto:ctubb...@apache.org] Sent: Saturday, October 06, 2018 12:28 AM To: accumulo-dev Subject: [DISCUSS] 2.0.0-alpha? Hi Accumulo devs, I'm thinking about initiating a vote next week for a 2.0.0-alpha release, so we can have an official ASF release (albeit without the usual stability expectations as a normal release) to be available for the upcoming Accumulo Summit. An alpha version would signal our progress towards 2.0.0 final, serve as a basis for testing, and give us something to share with a wider audience to solicit feedback on the API, configuration, and module changes. Of course, it would still have to meet ASF release requirements... like licensing and stuff, and it should essentially work (so people can actually run tests), but in an alpha release, we could tolerate flaws we wouldn't in a final release. Ideally, I would have preferred a 2.0.0 final at this point in the year, but I think it needs more testing. Does an alpha release next week seem reasonable to you? Christopher
Re: LoadPlanTest unit test failure on master
Nevermind, Christopher has already fixed this it seems.. On 9/14/18 10:10 AM, Josh Elser wrote: I'll tag you in a PR. Trivial fix. Don't sweat it :) On 9/14/18 10:09 AM, Keith Turner wrote: I will look into it. This branch was outstanding for a long time. Yesterday I merged it, resolved conflicts, and then only ran mvn compile. I should have ran mvn verify again and waited. On Fri, Sep 14, 2018 at 9:11 AM, Josh Elser wrote: Color me surprised: this fails for me out of the box on 7ef140ec40c3768859b848350db8c6d6d20f7a56 [INFO] Running org.apache.accumulo.core.data.LoadPlanTest [ERROR] Tests run: 6, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 0.168 s <<< FAILURE! - in org.apache.accumulo.core.data.LoadPlanTest [ERROR] testTypes(org.apache.accumulo.core.data.LoadPlanTest) Time elapsed: 0.074 s <<< FAILURE! java.lang.AssertionError: expected:<[f5.rf:TABLET:yyy:null, f3.rf:DATA:368:479, f2.rf:DATA:abc:def, f7.rf:TABLET:www:null, f9.rf:TABLET:xxx:null, fb.rf:TABLET:heg:klt, fd.rf:TABLET:null:null, f6.rf:TABLET:null:bbb, fc.rf:TABLET:agt:ctt, f8.rf:TABLET:null:ccc, f1.rf:DATA:1112:1145, f4.rf:TABLET:null:aaa, fa.rf:TABLET:1138:1147]> but was:<[f5.rf:TABLE:yyy:null, f4.rf:TABLE:null:aaa, f9.rf:TABLE:xxx:null, f2.rf:FILE:abc:def, f3.rf:FILE:368:479, f7.rf:TABLE:www:null, f8.rf:TABLE:null:ccc, fa.rf:TABLE:1138:1147, fd.rf:TABLE:null:null, f1.rf:FILE:1112:1145, fc.rf:TABLE:agt:ctt, f6.rf:TABLE:null:bbb, fb.rf:TABLE:heg:klt]> at org.apache.accumulo.core.data.LoadPlanTest.testTypes(LoadPlanTest.java:93) @Keith, I see that HEAD is a commit which touches LoadPlanTest. Haven't yet dug into the test (seems to be new code since I've touched Accumulo), lobbing this as a softball for now and will investigate as time allows.
Re: LoadPlanTest unit test failure on master
I'll tag you in a PR. Trivial fix. Don't sweat it :) On 9/14/18 10:09 AM, Keith Turner wrote: I will look into it. This branch was outstanding for a long time. Yesterday I merged it, resolved conflicts, and then only ran mvn compile. I should have ran mvn verify again and waited. On Fri, Sep 14, 2018 at 9:11 AM, Josh Elser wrote: Color me surprised: this fails for me out of the box on 7ef140ec40c3768859b848350db8c6d6d20f7a56 [INFO] Running org.apache.accumulo.core.data.LoadPlanTest [ERROR] Tests run: 6, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 0.168 s <<< FAILURE! - in org.apache.accumulo.core.data.LoadPlanTest [ERROR] testTypes(org.apache.accumulo.core.data.LoadPlanTest) Time elapsed: 0.074 s <<< FAILURE! java.lang.AssertionError: expected:<[f5.rf:TABLET:yyy:null, f3.rf:DATA:368:479, f2.rf:DATA:abc:def, f7.rf:TABLET:www:null, f9.rf:TABLET:xxx:null, fb.rf:TABLET:heg:klt, fd.rf:TABLET:null:null, f6.rf:TABLET:null:bbb, fc.rf:TABLET:agt:ctt, f8.rf:TABLET:null:ccc, f1.rf:DATA:1112:1145, f4.rf:TABLET:null:aaa, fa.rf:TABLET:1138:1147]> but was:<[f5.rf:TABLE:yyy:null, f4.rf:TABLE:null:aaa, f9.rf:TABLE:xxx:null, f2.rf:FILE:abc:def, f3.rf:FILE:368:479, f7.rf:TABLE:www:null, f8.rf:TABLE:null:ccc, fa.rf:TABLE:1138:1147, fd.rf:TABLE:null:null, f1.rf:FILE:1112:1145, fc.rf:TABLE:agt:ctt, f6.rf:TABLE:null:bbb, fb.rf:TABLE:heg:klt]> at org.apache.accumulo.core.data.LoadPlanTest.testTypes(LoadPlanTest.java:93) @Keith, I see that HEAD is a commit which touches LoadPlanTest. Haven't yet dug into the test (seems to be new code since I've touched Accumulo), lobbing this as a softball for now and will investigate as time allows.
Re: LoadPlanTest unit test failure on master
Nevermind, this is silly: s/TABLET/TABLE/ On 9/14/18 9:11 AM, Josh Elser wrote: Color me surprised: this fails for me out of the box on 7ef140ec40c3768859b848350db8c6d6d20f7a56 [INFO] Running org.apache.accumulo.core.data.LoadPlanTest [ERROR] Tests run: 6, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 0.168 s <<< FAILURE! - in org.apache.accumulo.core.data.LoadPlanTest [ERROR] testTypes(org.apache.accumulo.core.data.LoadPlanTest) Time elapsed: 0.074 s <<< FAILURE! java.lang.AssertionError: expected:<[f5.rf:TABLET:yyy:null, f3.rf:DATA:368:479, f2.rf:DATA:abc:def, f7.rf:TABLET:www:null, f9.rf:TABLET:xxx:null, fb.rf:TABLET:heg:klt, fd.rf:TABLET:null:null, f6.rf:TABLET:null:bbb, fc.rf:TABLET:agt:ctt, f8.rf:TABLET:null:ccc, f1.rf:DATA:1112:1145, f4.rf:TABLET:null:aaa, fa.rf:TABLET:1138:1147]> but was:<[f5.rf:TABLE:yyy:null, f4.rf:TABLE:null:aaa, f9.rf:TABLE:xxx:null, f2.rf:FILE:abc:def, f3.rf:FILE:368:479, f7.rf:TABLE:www:null, f8.rf:TABLE:null:ccc, fa.rf:TABLE:1138:1147, fd.rf:TABLE:null:null, f1.rf:FILE:1112:1145, fc.rf:TABLE:agt:ctt, f6.rf:TABLE:null:bbb, fb.rf:TABLE:heg:klt]> at org.apache.accumulo.core.data.LoadPlanTest.testTypes(LoadPlanTest.java:93) @Keith, I see that HEAD is a commit which touches LoadPlanTest. Haven't yet dug into the test (seems to be new code since I've touched Accumulo), lobbing this as a softball for now and will investigate as time allows.
LoadPlanTest unit test failure on master
Color me surprised: this fails for me out of the box on 7ef140ec40c3768859b848350db8c6d6d20f7a56 [INFO] Running org.apache.accumulo.core.data.LoadPlanTest [ERROR] Tests run: 6, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 0.168 s <<< FAILURE! - in org.apache.accumulo.core.data.LoadPlanTest [ERROR] testTypes(org.apache.accumulo.core.data.LoadPlanTest) Time elapsed: 0.074 s <<< FAILURE! java.lang.AssertionError: expected:<[f5.rf:TABLET:yyy:null, f3.rf:DATA:368:479, f2.rf:DATA:abc:def, f7.rf:TABLET:www:null, f9.rf:TABLET:xxx:null, fb.rf:TABLET:heg:klt, fd.rf:TABLET:null:null, f6.rf:TABLET:null:bbb, fc.rf:TABLET:agt:ctt, f8.rf:TABLET:null:ccc, f1.rf:DATA:1112:1145, f4.rf:TABLET:null:aaa, fa.rf:TABLET:1138:1147]> but was:<[f5.rf:TABLE:yyy:null, f4.rf:TABLE:null:aaa, f9.rf:TABLE:xxx:null, f2.rf:FILE:abc:def, f3.rf:FILE:368:479, f7.rf:TABLE:www:null, f8.rf:TABLE:null:ccc, fa.rf:TABLE:1138:1147, fd.rf:TABLE:null:null, f1.rf:FILE:1112:1145, fc.rf:TABLE:agt:ctt, f6.rf:TABLE:null:bbb, fb.rf:TABLE:heg:klt]> at org.apache.accumulo.core.data.LoadPlanTest.testTypes(LoadPlanTest.java:93) @Keith, I see that HEAD is a commit which touches LoadPlanTest. Haven't yet dug into the test (seems to be new code since I've touched Accumulo), lobbing this as a softball for now and will investigate as time allows.
Re: [DRAFT] [REPORT] Apache Accumulo - July 2018
On 7/9/18 4:54 PM, Michael Wall wrote: Josh, I am not clear on what you are suggesting for another action item. Are you suggesting a pass over contributors to add see if anyone should be invited to become a committer or are you suggesting we revisit that every committer becomes a PMC member? If we say that we have no added any new committers/PMC members in a long period of time, the board will most assuredly say "Have you looked at your contributors to see if you should invite some to be committers?" I was trying to suggest that we proactively tell them "we know we need to see if there are contributors to invite to be committers" in order to save that middle step. No need to do this -- just trying to be helpful based on what I see over and over again from the board :)
Re: [DRAFT] [REPORT] Apache Accumulo - July 2018
+1 definitely. Especially since the organizers went through the proper TM approval steps. I'd suggest expanding that, in addition to no new committers/PMC members, to include an action item. e.g. Should we make a pass over contributors? Or, is participation "constant" (c=pmc makes this a bit easier to put into words ;)) On 7/1/18 5:06 PM, Mike Drob wrote: Worth mentioning upcoming summit? On Sun, Jul 1, 2018, 3:54 PM Michael Wall wrote: The Apache Accumulo PMC decided to draft its quarterly board reports on the dev list. Here is a draft of our report which is due by Wednesday, Jul 11, 1 week before the board meeting on Wednesday, Jul 18. Please let me know if you have any suggestions. I am a little earlier with this one since I will be on vacation 5-15 Jul. My plan is to submit this report sometime on Mon, Jul 9. Mike -- ## Description: - The Apache Accumulo sorted, distributed key/value store is a robust, scalable, high performance data storage system that features cell-based access control and customizable server-side processing. It is based on Google's BigTable design and is built on top of Apache Hadoop, Zookeeper, and Thrift. ## Issues: - There are no issues requiring board attention at this time. ## Activity: - There were 2 new releases, Accumulo 1.9.0 and Accumulo 1.9.1 since the last report. The 1.9.0 release [1] had a critical bug in the Write Ahead Log (WAL) process that is fixed in 1.9.1 [2]. - There were no new committers since the last report. All committers are also PMC members. - The PMC decided to switch to using github issue for the project and all subprojects [3], which is why the Jira activity dropped off. Github issues and pull request statistics are included below. - Another bug has been found in the WAL process and a 1.9.2 is in the works. ## Health report: - The project remains healthy. Activity levels on mailing lists, git and JIRA remain constant. ## PMC changes: - Currently 34 PMC members. - No new PMC members added in the last 3 months - Last PMC addition was Nick Felts on Thu Mar 22 2018 ## Committer base changes: - See PMC changes, all committers are PMC members currently. ## Releases: - accumulo-1.9.0 was released on Tue Apr 17 2018 - accumulo-1.9.1 was released on Sun May 13 2018 ## Mailing list activity: - Nothing significant in the figures ## Issue activity: - 25 issued open and 19 closed in the last 3 months [4] - 10 pull requests opened and 58 closed in the last 3 months [5] [1]: http://accumulo.apache.org/release/accumulo-1.9.0/ [2]: http://accumulo.apache.org/release/accumulo-1.9.1 [3]: http://accumulo.apache.org/blog/2018/03/16/moving-to-github-issues.html [4]: https://github.com/apache/accumulo/issues?utf8=%E2%9C%93=is%3Aissue+created%3A%3E2018-04-18+ [5]: https://github.com/apache/accumulo/pulls?utf8=%E2%9C%93=is%3Apr+is%3Aopen+created%3A%3E2018-04-18+
Re: [DISCUSS] Draft release timeline for 2.0.0
On 6/12/18 1:20 AM, Christopher wrote: On Mon, Jun 11, 2018 at 10:46 PM Josh Elser wrote: I'm just trying to point out the fallacy of meeting deadlines when the criteria for "success" is undefined. Why? I proposed the timeline to solicit opinions on it. Use whatever subjective criteria you want to inform your own. If you have criteria that you think won't be satisfied within that timeline, then raise them for discussion. Again, I am stating that a timeline with no recognition of what work needs to be done is silly. Yes, you can draw a line in the sand for when you want work to be done, but that's ineffective in making an actionable feature complete date. If you want the date to be meaningful, you need to understand what work actually _has_ to be done and structure the date around that. Does this make sense? If Jira is overburdened, move everything out and have people move things back. We have multiple tools -- we should at least have one in use. Otherwise, this just seems like there are decisions happening behind the scenes. You lost me. Every release, we triage (finish, reject, or bump) open issues; nobody's done that yet for 2.0. That's all I was talking about with regard to the issue tracker noise. I thought you were saying that there were too many open issues on Jira to glean any information on outstanding work from it. I was trying to give a suggestion about how to move past that.
Re: [DISCUSS] Draft release timeline for 2.0.0
On 6/12/18 10:36 AM, Keith Turner wrote: On Mon, Jun 11, 2018 at 10:46 PM, Josh Elser wrote: I'm just trying to point out the fallacy of meeting deadlines when the criteria for "success" is undefined. If Jira is overburdened, move everything out and have people move things back. We have multiple tools -- we should at least have one in use. Otherwise, this just seems like there are decisions happening behind the scenes. To communicate what we would like to see in 2.0.0, I propose opening a Github issue, tagging it as 2.0.0, and marking it as a blocker. We can always triage and discuss the open blockers it later in the summer. That'd be great.
Re: [DISCUSS] Draft release timeline for 2.0.0
I'm just trying to point out the fallacy of meeting deadlines when the criteria for "success" is undefined. If Jira is overburdened, move everything out and have people move things back. We have multiple tools -- we should at least have one in use. Otherwise, this just seems like there are decisions happening behind the scenes. On Mon, Jun 11, 2018, 7:52 PM Christopher wrote: > I do not expect that page to be a complete or final set of features right > now, but it's probably better than the issue tracker is (because of all the > noise of old issues). Part of the goal of this thread was to motivate > people to start finalizing that set over the next few weeks as they triage > open issues and think about what they can realistically finish in the > timeline we establish. The hope is that the page will become more and more > complete as head more strongly towards this release. > > As for the timeline, I have no problem moving the time table up if we get a > bit further along and realize we're in a good place to release. I just > don't like the pressure of unrealistically short timelines, and I know that > personally, my summer is going to be very busy regardless. Initially, I was > hoping we could release around September 1st... but then I figured add a > month for dedicated testing and documentation might be nice... and we'd > still release before the summit. > > > On Mon, Jun 11, 2018 at 6:36 PM Josh Elser wrote: > > > Based on that, https://issues.apache.org/jira/browse/ACCUMULO-4733 is > > the only thing outstanding (and just one question at that). > > > > Mid/late August seems like a long time until feature-complete for > > essentially a no-op of work :) > > > > On 6/11/18 5:07 PM, Christopher wrote: > > > I believe those are being maintained in the draft release notes at > > > https://accumulo.apache.org/release/accumulo-2.0.0/ > > > > > > On Mon, Jun 11, 2018 at 5:02 PM Josh Elser wrote: > > > > > >> What are the current 2.0.0 features? (Outstanding and completed) > > >> > > >> On 6/11/18 4:35 PM, Christopher wrote: > > >>> Hi Accumulo Devs, > > >>> > > >>> I've been thinking about the 2.0.0 release timeline. I was thinking > > >>> something like this milestone timeline: > > >>> > > >>> Feature Complete : mid-late August > > >>> Dedicated Testing, Documentation, and release voting : all of > September > > >>> Final release : October 1st > > >>> > > >>> This schedule would make 2.0.0 available for the Accumulo Summit > coming > > >> up > > >>> in October, with a few weeks to spare. > > >>> > > >> > > > > > >
Re: [DISCUSS] Draft release timeline for 2.0.0
Based on that, https://issues.apache.org/jira/browse/ACCUMULO-4733 is the only thing outstanding (and just one question at that). Mid/late August seems like a long time until feature-complete for essentially a no-op of work :) On 6/11/18 5:07 PM, Christopher wrote: I believe those are being maintained in the draft release notes at https://accumulo.apache.org/release/accumulo-2.0.0/ On Mon, Jun 11, 2018 at 5:02 PM Josh Elser wrote: What are the current 2.0.0 features? (Outstanding and completed) On 6/11/18 4:35 PM, Christopher wrote: Hi Accumulo Devs, I've been thinking about the 2.0.0 release timeline. I was thinking something like this milestone timeline: Feature Complete : mid-late August Dedicated Testing, Documentation, and release voting : all of September Final release : October 1st This schedule would make 2.0.0 available for the Accumulo Summit coming up in October, with a few weeks to spare.
Re: Number of entries
Hi Marcus, Via what means? This information is present on the Accumulo Monitor UI already, lagging only by a compaction happening on relevant Tablets. You can easily look at this data for just about any installation. If via code, I don't believe there is public API (stable) for requesting the table sizes, but the Monitor pulls this data from the accumulo:metadata table. You could do the same. The accumulo:metadata table has a reference to each file contained by a table, and with it the number of entries in that file. It's a simple calculation to compute the number of entries for a table once you can extract the number of entries for a single tablet. On 6/4/18 11:48 AM, Mauro Schneider wrote: Hello How to find out a number of entries of table with a ton of data in the Accumulo ? Mauro Schneider
Re: Java (eventually) dropping Serialization
Also see the xolstice protobuf-maven-plugin which marries up nicely with the OS properties. On 5/30/18 3:40 PM, Brian Loss wrote: If I understand what you are asking correctly, os-maven-plugin [1] is what you are looking for. It will determine the os name and arch (it puts them in properties os.detected.name and os.detected.arch) and you can use those values to declare the right executableDependency [2] for the exec-maven-plugin. [1] http://search.maven.org/#search%7Cga%7C1%7Cg%3A%22kr.motd.maven%22%20AND%20a%3A%22os-maven-plugin%22 [2] https://www.mojohaus.org/exec-maven-plugin/examples/example-exec-using-executabledependency.html From: Christopher [ctubb...@apache.org] Sent: Wednesday, May 30, 2018 3:17 PM To: dev@accumulo.apache.org Subject: Re: Java (eventually) dropping Serialization I wasn't aware they were publishing pre-built binaries for various platforms to Maven Central. That could be quite useful if we could automatically download the correct one during the Maven build, and use that to generate the code. It could still be problematic if they are dynamically linked to specific version ranges of system libraries, but I'd be interested in trying. Do you know if that tooling already exists as a Maven plugin or similar?
Re: Java (eventually) dropping Serialization
On 5/30/18 12:41 PM, Christopher wrote: On Wed, May 30, 2018 at 11:59 AM Josh Elser wrote: On 5/30/18 9:08 AM, Keith Turner wrote: On Wed, May 30, 2018 at 12:16 AM, Christopher wrote: I thought this was interesting: https://www.infoworld.com/article/3275924/java/oracle-plans-to-dump-risky-java-serialization.html If the long-term plan is to remove serialization from Java classes (in favor of a lightweight, possibly pluggable, "Records" serialization framework), we should begin thinking about how we use serialization in Accumulo's code today. At the very least, we should try to avoid any reliance on it in any future persistence of objects in Accumulo. If we see an opportunity to remove it in our current code anywhere, it might be worth spending the time to do follow through with such a change. Of course, this is probably going to be a *very* long time before it is actually dropped from Java, but it's not going to hurt to start thinking about it now. (Accumulo uses Java serialization for storing FaTE transaction information, and perhaps elsewhere.) We currently do not support FaTE transactions across minor versions. The upgrade code checks for any outstanding FaTE transactions. So this makes it easier to upgrade on a minor version. I would like to see FaTE use a human readable format like Json because it would make debugging easier. I'd strongly suggest against using JSON as you it forces the application to know how to handle drift in "schema". It would be nice to avoid the need to flush the outstanding fate txns on upgrade. If you just want a JSON-ish way to look at the data, I'd suggest moving over to protobuf3 and check out the support they have around JSON. https://developers.google.com/protocol-buffers/docs/proto3#json Protobuf certainly has better support for schemas... but I like the simplicity of using JSON directly and managing our own schema for FaTE to reduce dependencies. (Also, protobuf does not have a native Java compiler, AFAICT, which makes it a pain, similar to thrift, for portable code generation.) Whichever we choose, though, we've got plenty of time to hammer out these pros and cons, and experiment. Actually, you don't need to do a custom compiler installation for Protobuf3 on the majority of arches as there are compilers available via Maven central for protobuf on x86/64 and ppc. This is a non-issue for the majority of platforms. http://search.maven.org/#search%7Cga%7C1%7Ca%3A%22protoc%22 Managing your own schema is silly when there are tools whose specific purpose in creation was "[schema management is hard on its own and we can make it easier with guiderails]". Smells like "not-invented-here" to me.
Re: Java (eventually) dropping Serialization
On 5/30/18 9:08 AM, Keith Turner wrote: On Wed, May 30, 2018 at 12:16 AM, Christopher wrote: I thought this was interesting: https://www.infoworld.com/article/3275924/java/oracle-plans-to-dump-risky-java-serialization.html If the long-term plan is to remove serialization from Java classes (in favor of a lightweight, possibly pluggable, "Records" serialization framework), we should begin thinking about how we use serialization in Accumulo's code today. At the very least, we should try to avoid any reliance on it in any future persistence of objects in Accumulo. If we see an opportunity to remove it in our current code anywhere, it might be worth spending the time to do follow through with such a change. Of course, this is probably going to be a *very* long time before it is actually dropped from Java, but it's not going to hurt to start thinking about it now. (Accumulo uses Java serialization for storing FaTE transaction information, and perhaps elsewhere.) We currently do not support FaTE transactions across minor versions. The upgrade code checks for any outstanding FaTE transactions. So this makes it easier to upgrade on a minor version. I would like to see FaTE use a human readable format like Json because it would make debugging easier. I'd strongly suggest against using JSON as you it forces the application to know how to handle drift in "schema". It would be nice to avoid the need to flush the outstanding fate txns on upgrade. If you just want a JSON-ish way to look at the data, I'd suggest moving over to protobuf3 and check out the support they have around JSON. https://developers.google.com/protocol-buffers/docs/proto3#json
Re: Use of Flush Table Operation
Can you give some more context? Strikes me as strange to be wanting to change a method which we want to remove (being deprecated). On 4/23/18 6:05 PM, Mike Miller wrote: Quick Survey: Does your project use the flush Table Operation in Accumulo? I am looking into changing the default behavior of the deprecated flush method[1] and was wondering if and how it is currently being used. Any response would be helpful. Thanks! [1] https://github.com/apache/accumulo/blob/master/core/src/main/java/org/apache/accumulo/core/client/admin/TableOperations.java#L471
Re: [DRAFT] [REPORT] Apache Accumulo - March 2018
On 4/9/18 12:40 PM, Michael Wall wrote: Thanks for reading this Josh. 1 - In my first draft I had no information about Tony but I felt like that was going to prompt questions. I agree contributor is more relevant but he is not listed on the people page. Do we think that matters? I don't think that matters. The contributors page is code-contributors, but the board certainly knows that such lists are not all-encompassing. 2 - What were you thinking on the "what's next"? Maybe it is time for us to really define a roadmap, not sure there is time before this is due on Wed though. My omission of specifics here was intentional ;) I don't have the time to devote meaningful cycles here, but I would agree with you that something should probably be documented. If nothing else, acknowledging that a roadmap is necessary is a sufficient improvement!
Re: [DRAFT] [REPORT] Apache Accumulo - March 2018
Two minor suggestions: * Strike the "A member of the Apache Nifi PMC" part. The fact that Tony was NiFi PMC member isn't really relevant to the request, is it? IMO, if anything, it's more relevant that Tony is a contributor to Accumulo. * There isn't any discussion about what is coming next -- we have the 1.7.4 release, but no details about what's in the pipeline. On 4/7/18 10:19 AM, Michael Wall wrote: The Apache Accumulo PMC decided to draft its quarterly board reports on the dev list. Here is a draft of our report which is due by Wednesday, Mar 11, 1 week before the board meeting on Wednesday, Mar 18. Please let me know if you have any suggestions. I plan to submit it late on the 10th. Mike -- ## Description: - The Apache Accumulo sorted, distributed key/value store is a robust, scalable, high performance data storage system that features cell-based access control and customizable server-side processing. It is based on Google's BigTable design and is built on top of Apache Hadoop, Zookeeper, and Thrift. ## Issues: - There are no issues requiring board attention at this time. ## Activity: - There was 1 new release, Accumulo 1.7.4, since the last report. - There were 4 new committers since the last report. All committers are also PMC members. - The PMC decided to switch to using github issue for the project and all subprojects. - The PMC decided to drop support for Hadoop 2 in the Accumulo 2.0. - A member of the Apache Nifi PMC requested permission to use the Accumulo logo on t-shirts and/or stickers to promote projects he uses. VP Brand Management and the PMC had no objections. ## Health report: - The project remains healthy. Activity levels on mailing lists, git and JIRA remain constant. ## PMC changes: - Currently 34 PMC members. - New PMC members: - Adam J. Shook was added to the PMC on Wed Jan 24 2018 - Mark Owens was added to the PMC on Tue Mar 20 2018 - Luis Tavarez was added to the PMC on Tue Mar 20 2018 - Nick Felts was added to the PMC on Thu Mar 22 2018 ## Committer base changes: - Currently 34 committers. - New commmitters: - Adam J. Shook was added as a committer on Wed Jan 24 2018 - Mark Owens was added as a committer on Wed Mar 21 2018 - Luis Tavarez was added as a committer on Wed Mar 21 2018 - Nick Felts was added as a committer on Sat Mar 24 2018 ## Releases: - accumulo-1.7.4 was released on Thu Mar 22 2018 ## Mailing list activity: - Nothing significant in the figures ## JIRA activity: - 65 JIRA tickets created in the last 3 months - 101 JIRA tickets closed/resolved in the last 3 months
Re: [VOTE] Accumulo 1.7.4-rc1
Yup, that's the only thing that would have come to my mind (hadoop bugs that have been long-fixed). On 3/22/18 10:07 PM, Christopher wrote: Yeah, I vaguely remember that now. It definitely seems to be the problem here. I believe it is one of the reasons we moved to 2.6.4 in 1.8+ I had forgotten about that. The 1.7 branch was still building with Hadoop 2.2.0 and I now do my testing with jdk8 only, so that's why it kept being a problem for me. Thanks for the info. I spent way too much time this week looking into this when all I needed to do was test with a newer version of Hadoop (or an older version of jdk), but at least I now know what the problem was. On Thu, Mar 22, 2018, 21:15 Billie Rinaldi <billie.rina...@gmail.com> wrote: On Thu, Mar 22, 2018 at 2:31 PM, Christopher <ctubb...@apache.org> wrote: Josh, I know you said you didn't have much time, but just in case you get a moment: do you know why `UserGroupInformation.isLoginKeytabBased()` might be false on the server side? This seems to be the root cause of the problems. I recall running into HADOOP-10786 a while ago ("in java 8 isKeyTab is always false given the current UGI implementation"). Not sure if that would be relevant here. https://github.com/apache/accumulo/blob/b0016c3ca36e15ee4bdde727ea5b6a 18597de0ff/core/src/main/java/org/apache/accumulo/core/rpc/ ThriftUtil.java#L383 On Thu, Mar 22, 2018 at 4:00 PM Josh Elser <els...@apache.org> wrote: I don't have the time to look at these right now. There isn't much special about how Accumulo uses Kerberos either. It's straightforward use via SASL with Thrift. I haven't looked at it since it was passing when I wrote it originally. On 3/20/18 2:32 PM, Christopher wrote: I'm currently looking at the KerberosRenewalIT failures that seem to be persisting across branches. From the logs, it looks like the accumulo services are trying to do ticket-cache based login renewals, instead of keytab-based renewals. This has been a problematic test for me before, and as such, I've gotten into the habit of ignoring it, but since I've not been able to get it to work on reruns, and it fails nearly 100% of the time (if not 100%) for me now, I decided to take a closer look. If it is doing ticket-cache based renewals, that could indicate a bug in the Kerberos authentication, and that would probably warrant a -1 from me... but I will continue to investigate first. Josh, you know more about the Kerberos stuff than anyone here, so if you have time/interest, I wouldn't mind getting your feedback on why this test might be failing for me. On Mon, Mar 19, 2018 at 3:44 PM Christopher <ctubb...@apache.org <mailto:ctubb...@apache.org>> wrote: Accumulo Developers, Please consider the following candidate for Accumulo 1.7.4. Git Commit: b2a59189108d736729432e81b3d5717000c6b891 Branch: 1.7.4-rc1 If this vote passes, a gpg-signed tag will be created using: git tag -f -m 'Apache Accumulo 1.7.4' -s rel/1.7.4 \ b2a59189108d736729432e81b3d5717000c6b891 Staging repo: https://repository.apache.org/content/repositories/ orgapacheaccumulo-1068 Source (official release artifact): https://repository.apache.org/content/repositories/ orgapacheaccumulo-1068/org/apache/accumulo/accumulo/1.7. 4/accumulo-1.7.4-src.tar.gz Binary: https://repository.apache.org/content/repositories/ orgapacheaccumulo-1068/org/apache/accumulo/accumulo/1.7. 4/accumulo-1.7.4-bin.tar.gz (Append ".sha1", ".md5", or ".asc" to download the signature/hash for a given artifact.) All artifacts were built and staged with: mvn release:prepare && mvn release:perform Signing keys are available at https://www.apache.org/dist/accumulo/KEYS (Expected fingerprint: 8CC4F8A2B29C2B040F2B835D6F0CDAE700B6899D) Release notes (in progress) can be found at: https://accumulo.apache.org/release/accumulo-1.7.4/ Please vote one of: [ ] +1 - I have verified and accept... [ ] +0 - I have reservations, but not strong enough to vote against... [ ] -1 - Because..., I do not accept... ... these artifacts as the 1.7.4 release of Apache Accumulo. This vote will remain open until at least Thu Mar 22 20:00:00 UTC 2018 (Thu Mar 22 16:00:00 EDT 2018 / Thu Mar 22 13:00:00 PDT 2018). Voting continues until the release manager sends an email closing the vote. Thanks! P.S. Hint: download the whole staging repo with wget -erobots=off -r -l inf -np -nH \ https://repository.apache.org/content/repositories/ orgapacheaccumulo-1068/ # note the trailing slash is needed
Re: [VOTE] Accumulo 1.7.4-rc1
I don't have the time to look at these right now. There isn't much special about how Accumulo uses Kerberos either. It's straightforward use via SASL with Thrift. I haven't looked at it since it was passing when I wrote it originally. On 3/20/18 2:32 PM, Christopher wrote: I'm currently looking at the KerberosRenewalIT failures that seem to be persisting across branches. From the logs, it looks like the accumulo services are trying to do ticket-cache based login renewals, instead of keytab-based renewals. This has been a problematic test for me before, and as such, I've gotten into the habit of ignoring it, but since I've not been able to get it to work on reruns, and it fails nearly 100% of the time (if not 100%) for me now, I decided to take a closer look. If it is doing ticket-cache based renewals, that could indicate a bug in the Kerberos authentication, and that would probably warrant a -1 from me... but I will continue to investigate first. Josh, you know more about the Kerberos stuff than anyone here, so if you have time/interest, I wouldn't mind getting your feedback on why this test might be failing for me. On Mon, Mar 19, 2018 at 3:44 PM Christopher> wrote: Accumulo Developers, Please consider the following candidate for Accumulo 1.7.4. Git Commit: b2a59189108d736729432e81b3d5717000c6b891 Branch: 1.7.4-rc1 If this vote passes, a gpg-signed tag will be created using: git tag -f -m 'Apache Accumulo 1.7.4' -s rel/1.7.4 \ b2a59189108d736729432e81b3d5717000c6b891 Staging repo: https://repository.apache.org/content/repositories/orgapacheaccumulo-1068 Source (official release artifact): https://repository.apache.org/content/repositories/orgapacheaccumulo-1068/org/apache/accumulo/accumulo/1.7.4/accumulo-1.7.4-src.tar.gz Binary: https://repository.apache.org/content/repositories/orgapacheaccumulo-1068/org/apache/accumulo/accumulo/1.7.4/accumulo-1.7.4-bin.tar.gz (Append ".sha1", ".md5", or ".asc" to download the signature/hash for a given artifact.) All artifacts were built and staged with: mvn release:prepare && mvn release:perform Signing keys are available at https://www.apache.org/dist/accumulo/KEYS (Expected fingerprint: 8CC4F8A2B29C2B040F2B835D6F0CDAE700B6899D) Release notes (in progress) can be found at: https://accumulo.apache.org/release/accumulo-1.7.4/ Please vote one of: [ ] +1 - I have verified and accept... [ ] +0 - I have reservations, but not strong enough to vote against... [ ] -1 - Because..., I do not accept... ... these artifacts as the 1.7.4 release of Apache Accumulo. This vote will remain open until at least Thu Mar 22 20:00:00 UTC 2018 (Thu Mar 22 16:00:00 EDT 2018 / Thu Mar 22 13:00:00 PDT 2018). Voting continues until the release manager sends an email closing the vote. Thanks! P.S. Hint: download the whole staging repo with wget -erobots=off -r -l inf -np -nH \ https://repository.apache.org/content/repositories/orgapacheaccumulo-1068/ # note the trailing slash is needed
Re: [DISCUSS] Remove tracer service (not instrumentation)
That was my expectation on how it would work My +1 was the idea of moving the Tracer to a separate service and having clear instructions for how users get back to the current functionality (how these two repositories get deployed), *before* it's removed from core Accumulo. This is because of the very clear testimony from a user about useful the feature was to them. On 3/16/18 7:36 PM, Christopher wrote: Would you both (Michael and Josh) be okay with moving it to a separate repo within the Accumulo project rather than ripping it out and leaving it only buried in git history? On Fri, Mar 16, 2018 at 7:15 PM Josh Elser <els...@apache.org> wrote: I think I'm in agreement with this subset of Mikes. I like the idea long-term. The tracing service is "add-on", and can live outside Accumulo. I don't like the idea of moving the code out and taking away code which is functional today. I am +1 on the idea of building the same functionality outside of the core product. I am -1 on removing the functionality in the core product until the replacement is ready (e.g. clear docs for users covering how they get back to "normal"). On 3/16/18 6:49 PM, Michael Wall wrote: Yeah, I get it. That should have said "without a working example alternative". Something to make it as easy as possible for someone currently using tracing to not loose functionality. Thanks On Fri, Mar 16, 2018, 18:38 Christopher <ctubb...@apache.org> wrote: The alternative is to configure any of the other HTrace sinks which are available. The current code for Accumulo's tracer service could even be forked and supported as a separate sink to optionally use (but as I said in my original email, I think it'd be better to encourage contribution to other presentation projects to use Accumulo as a backing store). On Fri, Mar 16, 2018 at 6:34 PM Michael Wall <mjw...@apache.org> wrote: I am in favor of removing the tracer ui from the monitor and the tracer service that stores the spans in Accumulo. I worry about doing so with a working alternative though. On Fri, Mar 16, 2018 at 6:25 PM Mike Drob <md...@apache.org> wrote: Do we have a migration story ready to go for folks that are used to seeing traces on the monitor? On Fri, Mar 16, 2018 at 5:17 PM, Tony Kurc <trk...@gmail.com> wrote: I like this idea. On Fri, Mar 16, 2018 at 5:09 PM, Christopher <ctubb...@apache.org> wrote: Devs, (This discussion is somewhat of a spinoff of our previous recent conversation about HTrace, but I'd like to narrow the discussion to one specific topic regarding our tracer service.) I'd like to remove Accumulo's tracer service and corresponding presentations in the monitor for 2.0. The tracer service currently acts as a sink for the traces from Accumulo. While there is interest in tracing Accumulo, and Accumulo may itself be suitable (with the right schema) for storing traces, I do not think acting as a "trace sink" is really the kind of thing we should be doing as part of Accumulo's out-of-the-box core functionality. Also, the presentation and search capabilities of the traces found in the trace table (by convention, and assumed by the monitor) is far from an ideal presentation of this data, and I don't think the Accumulo project should continue maintaining that inside the core project's monitor, either. I think we should encourage interested volunteers to contribute to other trace presentation software (wherever they may exist) any necessary "backing store" implementation based on Accumulo. None of this would remove tracing instrumentation from Accumulo... it would just require users interested in trace data from Accumulo to configure an appropriate sink to collect that data in some other integrated component of their overall architecture. Decoupling the integrated trace sink from the instrumentation in Accumulo like this could even be a step towards providing support for multiple different tracing libraries. (I guess we could do this now, but it would be easier if we were not also trying to provide a sink implementation for one specific version of one specific instrumentation library.) Thoughts?
Re: [DISCUSS] Remove tracer service (not instrumentation)
I think I'm in agreement with this subset of Mikes. I like the idea long-term. The tracing service is "add-on", and can live outside Accumulo. I don't like the idea of moving the code out and taking away code which is functional today. I am +1 on the idea of building the same functionality outside of the core product. I am -1 on removing the functionality in the core product until the replacement is ready (e.g. clear docs for users covering how they get back to "normal"). On 3/16/18 6:49 PM, Michael Wall wrote: Yeah, I get it. That should have said "without a working example alternative". Something to make it as easy as possible for someone currently using tracing to not loose functionality. Thanks On Fri, Mar 16, 2018, 18:38 Christopherwrote: The alternative is to configure any of the other HTrace sinks which are available. The current code for Accumulo's tracer service could even be forked and supported as a separate sink to optionally use (but as I said in my original email, I think it'd be better to encourage contribution to other presentation projects to use Accumulo as a backing store). On Fri, Mar 16, 2018 at 6:34 PM Michael Wall wrote: I am in favor of removing the tracer ui from the monitor and the tracer service that stores the spans in Accumulo. I worry about doing so with a working alternative though. On Fri, Mar 16, 2018 at 6:25 PM Mike Drob wrote: Do we have a migration story ready to go for folks that are used to seeing traces on the monitor? On Fri, Mar 16, 2018 at 5:17 PM, Tony Kurc wrote: I like this idea. On Fri, Mar 16, 2018 at 5:09 PM, Christopher wrote: Devs, (This discussion is somewhat of a spinoff of our previous recent conversation about HTrace, but I'd like to narrow the discussion to one specific topic regarding our tracer service.) I'd like to remove Accumulo's tracer service and corresponding presentations in the monitor for 2.0. The tracer service currently acts as a sink for the traces from Accumulo. While there is interest in tracing Accumulo, and Accumulo may itself be suitable (with the right schema) for storing traces, I do not think acting as a "trace sink" is really the kind of thing we should be doing as part of Accumulo's out-of-the-box core functionality. Also, the presentation and search capabilities of the traces found in the trace table (by convention, and assumed by the monitor) is far from an ideal presentation of this data, and I don't think the Accumulo project should continue maintaining that inside the core project's monitor, either. I think we should encourage interested volunteers to contribute to other trace presentation software (wherever they may exist) any necessary "backing store" implementation based on Accumulo. None of this would remove tracing instrumentation from Accumulo... it would just require users interested in trace data from Accumulo to configure an appropriate sink to collect that data in some other integrated component of their overall architecture. Decoupling the integrated trace sink from the instrumentation in Accumulo like this could even be a step towards providing support for multiple different tracing libraries. (I guess we could do this now, but it would be easier if we were not also trying to provide a sink implementation for one specific version of one specific instrumentation library.) Thoughts?
Re: [VOTE] Switch to GitHub issues
+0 since there seems to be such a strong desire to use this that I just don't quite understand :) Thanks to those who worked to clarify the ambiguity/issues that I was worried about previously. On 3/13/18 12:53 PM, Keith Turner wrote: Accumulo PMC, Please vote on initiating the transition from JIRA to GitHub. The purpose of this vote is to see if there is agreement on the following three items the community discussed. * Using this workflow initially : https://github.com/apache/accumulo-website/pull/59 * The ability to modify the workflow via lazy consensus through discussions on the dev list. * The goal of only using one issue tracker after a transition period in which two are used. If the vote passes, I will ask Infra to enable Github Issues and then merge the website PR. This vote will be open through at least Fri Mar 16 16:45:00 UTC 2018 (Fri Mar 16 12:45:00 EDT 2018 / Fri Mar 16 09:45:00 PDT 2018)
Re: Accumulo stickers or t-shirts?
IIRC, as long as you have PMC approval and you're not profiting off of the swag, it's ok from the ASF point of view. I can't find ASF trademarks documentation on it at the moment, however. On 3/11/18 12:52 PM, Tony Kurc wrote: Hi, I was wondering if anyone had ever designed and printed either stickers or t-shirts (or other cool stuff) for Apache Accumulo. I am likely to head to a conference soon and would like to have some things for trading and/or giveaways for the project. If there were some things that are already designed that are already "blessed" by the project, that would be awesome. If there aren't, do you all have a process for someone vetting a proposed design? Tony
Re: [DISCUSS] Switch to GitHub issues after trial
Sorry for the top-post. I really appreciate the numbered list below, Keith. Specifically the answers to #1 and #4 make me very happy. I think #5 needs some a little more concrete (IMO, you should just decide what it should be). #6 +1 to a message to private, this is how Apache general requests this be done). While I can appreciate your stance on #3 and I think I would not call it a blocker either, this is probably something worth the 15-30 minutes of investigation. Sean/Mike may feel more strongly than I do. Learning from others, even if it just dropping an email to dev@spark directly to ask the question goes a long way.. On 3/7/18 10:55 AM, Keith Turner wrote: On Mon, Mar 5, 2018 at 6:07 PM, Keith Turner <ke...@deenlo.com> wrote: On Thu, Feb 15, 2018 at 12:52 PM, Josh Elser <els...@apache.org> wrote: -0 as an initial reaction because I'm still not convinced that GH issues provides any additional features or better experience than JIRA does, and this change would only serve to fragment an already bare community. My concerns that would push that -0 to a -1 include (but aren't limited to): * Documentation/website update for the release process * Validation that our release processes on JIRA has similar functionality on GH issues * Updated contributor docs (removing JIRA related content, add an explanation as to the change) * CONTRIBUTING.md updated on relevant repos I opened the following PR with a proposal for how we could start using github. https://github.com/apache/accumulo-website/pull/59 There were lots of valid concerns raised during this discussion. The concerns shaped the proposal I submitted. Rather than reply to them individually in different emails I am collecting them all here and sharing my thoughts about them. 1. How do we release? JIRA is used in three important ways for releases : setting blockers, triaging issues, and generating release notes. I think the proposal addresses all three. 2. Will we document contributor guidelines to avoid confusion? What is expected of contributors is clearly documented. 3. Can someone investigate how Spark operates before switching? That would be great if someone volunteered to do this and wrote up their findings. However if no one volunteers, then I do not think this should be a blocker. There are many other projects that would be worthy of investigation also. 4. What is the migration plan for existing issues? Will we have split issue tracker for years? The proposal documents migrating existing JIRA issues as they are worked. This means that existing JIRA issues that are never worked will never migrate. After all branches are released, JIRA can be put in read only mode (only PMC can change it). It will be left active for reference and migration of existing issues. 5. How will we handle fix versions? The proposal suggest using issue labels in github for this. Also suggest using a prefix on fix version labels to make them sort last. 6. How will we handle security issues? We need to clearly document on our website how users should report security issues. I am not sure this is done at the moment. Since this is infrequent I think we can handle this on the private list. I think our workflow should be optimized for frequent actions and not infrequent ones. 7. Should we switch all repos to GH issues except Accumulo core? I think this a good example of how design by committee can go wrong. This is a really confusing solution that does not improve our workflow, so the benefits are not clear to me. - Josh On 2/15/18 12:05 PM, Mike Walch wrote: I would like to open discussion on moving from Jira to GitHub issues. GitHub issues would be enabled for a trial period. After this trial period, the project would either move completely to GitHub issues or keep using Jira. Two issue trackers would not be used after trial period.
Re: [DISCUSS] status of Hadoop 3 for 1.9.0 release
Yeah, if Hadoop has changed their stance, propagating a "use as your own risk" would be sufficient from our end. On 3/1/18 6:06 PM, Christopher wrote: If there's a risk, I'd suggest calling things out as "experimental" in the release notes, and encourage users to try it and give us feedback. On Thu, Mar 1, 2018 at 5:10 PM Sean Busbeywrote: hi folks! While reviewing things in prep for getting our master branch over to apache hadoop 3 only (see related discussion [1]), I noticed some wording on the last RC[2] for Hadoop 3.0.1: Please note: * HDFS-12990. Change default NameNode RPC port back to 8020. It makes incompatible changes to Hadoop 3.0.0. After 3.0.1 releases, Apache Hadoop 3.0.0 will be deprecated due to this change. Hadoop 3.0.0 was a production-ready release; the community did an extended set of alpha/beta releases to shake out the kinds of things that would have required labeling the X.Y.0 release as non-production in previous Hadoop 2 release lines. Deprecating it is a pretty strong signal, but from the extended discussion[3] it seems to me that this isn't meant indicate that the entire 3.0 release line will stop. What do folks think? - No problem from our perspective? - Worth waiting to ship a Hadoop 3 ready release until Hadoop 3.0.1 comes out? - Worth waiting to ship a Hadoop 3 ready release until Hadoop 3.1.0 comes out? - Leave things as-is and give a word of warning for would-be early adopters in our release notes? - Expressly call things out in our release notes as "experimental" and we might make changes once later Hadoop 3s come out? [1]: https://s.apache.org/pOKv [2]: https://s.apache.org/brE4 [3]: https://s.apache.org/BWd6
Re: [DISCUSS] Switch to GitHub issues after trial
After the rest of the discussion, I feel like I need to be explicit (so, I'm sorry if I'm being pedantic and we're already in agreement here): You're planning to document how GitHub tech would be used to make releases on these repositories? And, we're in agreement that JIRA would not be used at all for these repositories? In short, +0 as long as the process for releasing software is clear, I don't have issues with the process using different tooling than we presently use (although, still don't see the benefit to changing). On 3/1/18 2:41 PM, Mike Walch wrote: I would like to start up this discussion again. I don't think we have reached consensus on moving the primary Accumulo repo to GitHub issues. The primary repo has common workflows (i.e creating issues that affect multiple versions) that don't easily transition to GitHub issues. I have heard several solutions but no consensus. As for moving our secondary repos (listed below), this seems much easier and I haven't heard any concerns so far. Does anyone have concerns about moving these repos? https://github.com/apache/accumulo-docker https://github.com/apache/accumulo-examples https://github.com/apache/accumulo-testing https://github.com/apache/accumulo-website https://github.com/apache/accumulo-wikisearch On Fri, Feb 16, 2018 at 10:54 AM, Sean Busbeywrote: On Fri, Feb 16, 2018 at 9:27 AM, Mike Walch wrote: Some of the concerns brought up would be answerable with a trial. How do we do a release? What does aggregating issues fixed in a particular version look like? You can tag GH issues with a version but I think it's best to just go through commit history to compile the release notes. This should already be done as there is no guarantee even with Jira that all issues were labeled correctly. If you are using GitHub issues, all issue numbers in commits link back to the issue or pull request which we don't have with Jira right now. This gets to an issue I have. What's our source of truth about "X is fixed in Y" during the trial? I have been assuming that JIRA is currently our source of truth, but maybe that's wrong. Is it the release notes? IMHO, Git is a poor choice for the source of truth due to the immutability of commit messages, at least in ASF contexts since we can't do force pushes (in at least some branches). -- busbey
Re: [DISCUSS] tracing framework updates
Thanks for letting us know, Tony. I can totally understand how the server-side tracing (and collapsing it can do) would be super-helpful in figuring out what's happening. I read that as one reason for simply not trying to get HDFS and Accumulo re-sync'ed. I think we have value in leaving what we presently have in Accumulo now over removing it completely. On 2/27/18 8:50 PM, Tony Kurc wrote: Josh, It was exclusively the first - using the traces in the server-side code. The most common case is "I have a scan which is much slower than expected", and couldn't figure out why. I'm trying to think of alternative approaches to using the traces, and honestly, doing a bunch of log aggregation is the alternative I'd have to fall back to, and in some cases recompiling parts of accumulo with new log messages in place. Tony On Tue, Feb 27, 2018 at 7:18 PM, Josh Elser <els...@apache.org> wrote: Oh, that's a pleasant surprise to hear, actually. Anything you can share with the class, Tony? Would love to hear (even if brief) how it was used and benefited you. Specifically, I'm curious if... * You looked at traces from our server-side instrumented code * You instrumented your own code outside of Accumulo and used Accumulo as the backing store * You instrumented code inside/outside Accumulo and benefited from the server-side instrumentation (e.g. your code's spans collapsing with the server's spans) On 2/27/18 6:52 PM, Tony Kurc wrote: I'd personally be disappointed to see it removed. There is a bit of a learning curve and startup cost to use it now, but when diagnosing major challenges, it has been an invaluable capability. On Feb 27, 2018 3:15 PM, "Josh Elser" <els...@apache.org> wrote: Wow... that's, erm, quite the paper. Nothing like taking some pot-shots at another software project and quoting folks out of context. Does it help to break down the problem some more? * Is Accumulo getting benefit from tracing its library? * Is Accumulo getting benefit from tracing context including HDFS calls? I feel like it is a nice tool to have in your toolbelt (having used it successfully in the past), but I wonder if it's the most effective thing to keep inside of Accumulo. Specifically, would it be better to just pull this out of Accumulo outright? I don't think I have an opinion yet. On 2/27/18 1:08 PM, Ed Coleman wrote: For general discussion - Facebook recently (Oct 28, 2017) published a paper on tracing: Canopy: An End-to-End Performance Tracing and Analysis System (https://research.fb.com/publications/canopy-end-to-end- performance-tracing-at-scale/) As a bonus, they referenced Accumulo and HTrace in section 2.2 "Mismatched models affected compatibility between mixed system versions; e.g. Accumulo and Hadoop were impacted by the “continued lack of concern in the HTrace project around tracing during upgrades” -Original Message- From: Tony Kurc [mailto:tk...@apache.org] Sent: Tuesday, February 27, 2018 12:57 PM To: dev@accumulo.apache.org Subject: Re: [DISCUSS] tracing framework updates I have some experience with opentracing, and it definitely seems promising, however, potentially promising in the same way htrace was... That being said, I did a cursory thought exercise of what it would take to do a swap of the current tracing in accumulo to opentracing, and I didn't come across any hard problems, meaning it could be a fairly straightforward refactor. I was hoping to explore the community a bit more at some upcoming conferences On Feb 27, 2018 11:59 AM, "Sean Busbey" <bus...@apache.org> wrote: On 2018/02/27 16:39:02, Christopher <ctubb...@apache.org> wrote: I didn't realize HTrace was struggling in incubation. Maybe some of us can start participating? The project did start within Accumulo, after all. What does it need? I also wouldn't want to go back to maintaining cloudtrace. I suspect it's too late for HTrace. The last commit to the main development branch was May 2017. They had a decent run of activity in 2015 and an almost-resurgence in 2016, but they never really got enough community traction to survive the normal ebb and flow of contributor involvement. They need the things any project needs to be sustainable: regular release cadences, a responsive contribution process, and folks to do the long slog of building interest via e.g. production adoption. I'm unfamiliar with OpenTracing, but it was my understanding that Zipkin was more of a tracing sink, than an instrumentation API. HTrace is actually listed as an instrumentation library for Zipkin (among others). I think the key is that for a instrumentation library to get adoption it needs a good sink that provides utility to operators looking to diagnose problems. It took too long for HTrace to provide any tooling that could help with even simple performance profiling. Maybe hooking it into Zipkin would get around that. Personally, I never ma
Re: [DISCUSS] tracing framework updates
Oh, that's a pleasant surprise to hear, actually. Anything you can share with the class, Tony? Would love to hear (even if brief) how it was used and benefited you. Specifically, I'm curious if... * You looked at traces from our server-side instrumented code * You instrumented your own code outside of Accumulo and used Accumulo as the backing store * You instrumented code inside/outside Accumulo and benefited from the server-side instrumentation (e.g. your code's spans collapsing with the server's spans) On 2/27/18 6:52 PM, Tony Kurc wrote: I'd personally be disappointed to see it removed. There is a bit of a learning curve and startup cost to use it now, but when diagnosing major challenges, it has been an invaluable capability. On Feb 27, 2018 3:15 PM, "Josh Elser" <els...@apache.org> wrote: Wow... that's, erm, quite the paper. Nothing like taking some pot-shots at another software project and quoting folks out of context. Does it help to break down the problem some more? * Is Accumulo getting benefit from tracing its library? * Is Accumulo getting benefit from tracing context including HDFS calls? I feel like it is a nice tool to have in your toolbelt (having used it successfully in the past), but I wonder if it's the most effective thing to keep inside of Accumulo. Specifically, would it be better to just pull this out of Accumulo outright? I don't think I have an opinion yet. On 2/27/18 1:08 PM, Ed Coleman wrote: For general discussion - Facebook recently (Oct 28, 2017) published a paper on tracing: Canopy: An End-to-End Performance Tracing and Analysis System (https://research.fb.com/publications/canopy-end-to-end- performance-tracing-at-scale/) As a bonus, they referenced Accumulo and HTrace in section 2.2 "Mismatched models affected compatibility between mixed system versions; e.g. Accumulo and Hadoop were impacted by the “continued lack of concern in the HTrace project around tracing during upgrades” -Original Message- From: Tony Kurc [mailto:tk...@apache.org] Sent: Tuesday, February 27, 2018 12:57 PM To: dev@accumulo.apache.org Subject: Re: [DISCUSS] tracing framework updates I have some experience with opentracing, and it definitely seems promising, however, potentially promising in the same way htrace was... That being said, I did a cursory thought exercise of what it would take to do a swap of the current tracing in accumulo to opentracing, and I didn't come across any hard problems, meaning it could be a fairly straightforward refactor. I was hoping to explore the community a bit more at some upcoming conferences On Feb 27, 2018 11:59 AM, "Sean Busbey" <bus...@apache.org> wrote: On 2018/02/27 16:39:02, Christopher <ctubb...@apache.org> wrote: I didn't realize HTrace was struggling in incubation. Maybe some of us can start participating? The project did start within Accumulo, after all. What does it need? I also wouldn't want to go back to maintaining cloudtrace. I suspect it's too late for HTrace. The last commit to the main development branch was May 2017. They had a decent run of activity in 2015 and an almost-resurgence in 2016, but they never really got enough community traction to survive the normal ebb and flow of contributor involvement. They need the things any project needs to be sustainable: regular release cadences, a responsive contribution process, and folks to do the long slog of building interest via e.g. production adoption. I'm unfamiliar with OpenTracing, but it was my understanding that Zipkin was more of a tracing sink, than an instrumentation API. HTrace is actually listed as an instrumentation library for Zipkin (among others). I think the key is that for a instrumentation library to get adoption it needs a good sink that provides utility to operators looking to diagnose problems. It took too long for HTrace to provide any tooling that could help with even simple performance profiling. Maybe hooking it into Zipkin would get around that. Personally, I never managed to get the two to actually work together. My listing Zipkin as an option merely reflects my prioritization of practical impact of whatever we go to. I don't want to adopt some blue-sky effort. FWIW, OpenTracing docs at least claim to also provide a zipkin-sink compatible runtime. There's a whole community that just does distributed monitoring, maybe someone has time to survey some spaces and see if OpenTracing has any legs.
Re: [DISCUSS] dropping hadoop 2 support
+1 AFAIK, this wouldn't have to be anything more than build changes. "Dropping hadoop2 support" wouldn't need to include any other changes (as adding H3 support didn't require any Java changes). Getting in front of the ball to help push people towards newer versions would be a welcome change. On 2/27/18 10:42 AM, Sean Busbey wrote: Let's get the discussion started early on when we'll drop hadoop 2 support. As of ACCUMULO-4826 we are poised to have Hadoop 2 and Hadoop 3 supported in 1.y releases as of 1.9.0. That gives an upgrade path so that folks won't have to upgrade both Hadoop and Accumulo at the same time. How about Accumulo 2.0.0 requires Hadoop 3? If there's a compelling reason for our users to stay on Hadoop 2.y releases, we can keep making Accumulo 1.y releases. Due to the shift away from maintenance releases in Hadoop we'll need to get more aggressive in adopting minor releases.
Re: [DISCUSS] Proposed formatter change: 100 char lines
+1 to not changing min-Java on the release lines that supported Java 7. Let's just cease activity on these branches instead :) On 2/16/18 9:55 AM, Sean Busbey wrote: I'm opposed to requiring Java 8 to build on branches that we claim support running under Java 7. Historically relying on "compile for earlier target JDK" has just led to pain down the road when it inevitably doesn't work. Just make it a recommendation for contributions and have our precommit checks do the build with Java 8 to verify the formatting has already happened. On Thu, Feb 15, 2018 at 10:24 PM, Christopherwrote: Primarily for accessibility reasons (screen space with a comfortable font), but also to support readability for devs working on sensibly-sized screens, I want to change our formatter to format with 100 char line length instead of its current 160. Many of our files need to be reformatted anyway, because the current formatter is configured incorrectly for Java 8 lambda syntax and needs to be fixed, so this might be a good opportunity to make the switch. Also, at this point I think it is sensible to require Java 8 to build Accumulo... even when building older branches. (Accumulo 1.x will still support running on Java 7, of course, but Java 8 would be required to build it). The reason for this requirement is that in order to reduce merge conflicts and merge bugs between branches, I'd like to update the formatting across all branches, but the formatter which supports this syntax requires Java 8 to run. The alternative to requiring Java 8 would be to only run the formatter when building with Java 8... and skip formatting if building with Java 7, which might result in some unformatted contributions, depending on the JRE version used to build.
Re: [DISCUSS] Release 1.7.4 and the 1.9.0
SGTM On 2/15/18 11:10 PM, Ed Coleman wrote: I'd like to propose that we start the release process for 1.7.4 and then 1.9.0. I'm willing to be the release manager for both if that would facilitate things. As a strawman - I propose: March 1st - we start the formal release process of 1.7.4, with a goal that it would be complete and released around March 15th. This would be the last planned release of the 1.7.x line. March 19th we start the formal release process of 1.9.0. My real objective is to get a release of 1.9.0 that would be mostly equivalent to what would have been an 1.8.2, with the API changes for configuration to support Hadop-3. There seems to be some fixes in 1.8.1 that I'd like to see released and Keith Turner seems to be making some substantial fixes to performance issues that I'd hope to be able to take advantage of - however, I would like to have a bound to help limit upgrade risks. The dates are just a starting point for discussion - if Keith has additional fixes that we'd like to get in, but needs additional time that's fine with me, I'm really just pushing for sooner rather than later. Ed Coleman
Re: [DISCUSS] Switch to GitHub issues after trial
On 2/15/18 6:18 PM, Christopher wrote: On Thu, Feb 15, 2018 at 5:08 PM Josh Elser <els...@apache.org> wrote: On 2/15/18 4:56 PM, Christopher wrote: On Thu, Feb 15, 2018 at 4:55 PM Josh Elser <els...@apache.org> wrote: On 2/15/18 4:17 PM, Mike Drob wrote: What do we do if the trial is wildly successful? Is there a migration plan for our currently open issues? We have almost 1000 of them. As Keith said in the other thread, we don't need to have all the answers up front. You're right, we don't need to have all of the answers up front. This is one that I'd like to have some thought put into though. There's lots of things that are fine to handle as we approach it, but this one seems like it will lead to us having split issue trackers for_years_ down the road. This is a good point I hadn't yet considered. There's not only the migration question that eventually needs to be answered, but an immediate question of how will we determine when we can release a version of Accumulo? Are there conventions/features on the GH issues side that will provide some logical analog to the fixVersion of JIRA? These are all great questions... that could be answered with a trial... Shall I assume then that you are volunteering to handle all issue management across the disparate systems for all releases? A trial is a good idea to determine _if we like the system_ and want to migrate to it. It's not a substitute for determining if the system is _viable_. I'm of a different opinion: I already know I like GitHub issues and want to migrate to it. What I don't know is if it is viable for Accumulo's needs. Glad you like GH issues, but that isn't not what is being discussed here. The matter at hand is figuring out the logistics of *how* do we move to a different issue tracker in a manner that doesn't derail the project management of a fully-distributed team. I'm worried because I feel like there are valid concerns being brought up here without acknowledgement of the impact of those who only participate with Accumulo digitally.
Re: [DISCUSS] Switch to GitHub issues after trial
On 2/15/18 4:56 PM, Christopher wrote: On Thu, Feb 15, 2018 at 4:55 PM Josh Elser <els...@apache.org> wrote: On 2/15/18 4:17 PM, Mike Drob wrote: What do we do if the trial is wildly successful? Is there a migration plan for our currently open issues? We have almost 1000 of them. As Keith said in the other thread, we don't need to have all the answers up front. You're right, we don't need to have all of the answers up front. This is one that I'd like to have some thought put into though. There's lots of things that are fine to handle as we approach it, but this one seems like it will lead to us having split issue trackers for_years_ down the road. This is a good point I hadn't yet considered. There's not only the migration question that eventually needs to be answered, but an immediate question of how will we determine when we can release a version of Accumulo? Are there conventions/features on the GH issues side that will provide some logical analog to the fixVersion of JIRA? These are all great questions... that could be answered with a trial... Shall I assume then that you are volunteering to handle all issue management across the disparate systems for all releases? A trial is a good idea to determine _if we like the system_ and want to migrate to it. It's not a substitute for determining if the system is _viable_.
Re: [DISCUSS] Switch to GitHub issues after trial
On 2/15/18 4:17 PM, Mike Drob wrote: What do we do if the trial is wildly successful? Is there a migration plan for our currently open issues? We have almost 1000 of them. As Keith said in the other thread, we don't need to have all the answers up front. You're right, we don't need to have all of the answers up front. This is one that I'd like to have some thought put into though. There's lots of things that are fine to handle as we approach it, but this one seems like it will lead to us having split issue trackers for_years_ down the road. This is a good point I hadn't yet considered. There's not only the migration question that eventually needs to be answered, but an immediate question of how will we determine when we can release a version of Accumulo? Are there conventions/features on the GH issues side that will provide some logical analog to the fixVersion of JIRA?
Re: [DISCUSS] Switch to GitHub issues after trial
-0 as an initial reaction because I'm still not convinced that GH issues provides any additional features or better experience than JIRA does, and this change would only serve to fragment an already bare community. My concerns that would push that -0 to a -1 include (but aren't limited to): * Documentation/website update for the release process * Validation that our release processes on JIRA has similar functionality on GH issues * Updated contributor docs (removing JIRA related content, add an explanation as to the change) * CONTRIBUTING.md updated on relevant repos - Josh On 2/15/18 12:05 PM, Mike Walch wrote: I would like to open discussion on moving from Jira to GitHub issues. GitHub issues would be enabled for a trial period. After this trial period, the project would either move completely to GitHub issues or keep using Jira. Two issue trackers would not be used after trial period.
Re: Additional options for issue tracking
On 2/15/18 12:28 PM, Christopher wrote: Want to spin out a DISCUSS on the desire to switch, Mike Walch? That seems to me like it should be the next step. I thought that's what we were doing.:) This isn't tagged with DISCUSS in the subject (which I know some subscribers of our list filter on) and this thread is convoluted already. The intent of this discussion isn't cut-dry like it could be.
Re: Additional options for issue tracking
On 2/15/18 11:26 AM, Keith Turner wrote: On Thu, Feb 15, 2018 at 11:01 AM, Josh Elser <els...@apache.org> wrote: We tell users that try to file issues on the "unsupported" issue tracker that they've created the issue in the wrong place and point them to the right issue tracker. Personally I think that is ok for a short period. Its like driving on a road during construction, you know annoyance is unavoidable. However no one wants to drive on a road that under construction indefinitely. So if we start on this I would like consensus that we plan to transition from Jira to Github in a timely manner. I don't think we should try to figure everything out before we start though. I think it would be good to have a simple starting plan and we hill climb from there in search of a more optimal way of operating. I don't like the idea of enabling github issue with no consensus that the goal is to transition away from Jira. Leaving things in that state for a long period seems bad to me. If we start with the consensus to transition, its possible we may decide not to and that ok. I don't think any action needs to be taken now for that eventuality. We can figure that out as we go during the transition period. +1 on all of this. Want to spin out a DISCUSS on the desire to switch, Mike Walch? That seems to me like it should be the next step.
Re: Additional options for issue tracking
We tell users that try to file issues on the "unsupported" issue tracker that they've created the issue in the wrong place and point them to the right issue tracker. On 2/14/18 10:29 PM, Christopher wrote: Can you elaborate on what kind of human controls you mean? What if a user finds the GH issues and creates an issue there? What action should the developers take? On Wed, Feb 14, 2018 at 10:27 PM Josh Elser <els...@apache.org> wrote: I didn't ask for automated controls here -- human controls are fine. I have already said I am -1 on two concurrent issue trackers. If developers want to evaluate them, that's fine. On 2/14/18 10:12 PM, Christopher wrote: I don't think we have the ability to lock out non-committers from creating new GH issues if we enable them, nor do I think it would make sense to do so, since that's a valuable use case to consider during any trial period before shutting off JIRA. As for switching immediately to GH issues for non-primary repo (-website, -examples, -docker, etc...) I think that makes sense since those are already confusing when filed in the JIRA mixed in with the main repo's issues. On Wed, Feb 14, 2018 at 9:25 PM Josh Elser <els...@apache.org> wrote: I am OK with committers ONLY using GH issues on all repos (with clear guidance as to what the heck the project is doing) or doing a full-switch on the other repos. On 2/14/18 7:00 PM, Mike Miller wrote: We could do a trial period of GitHub issues for the accumulo sub-repos (accumulo-website, accumulo-examples...) then after a month or two decide to switch or not. That way we won't have duplicate issues or the confusion of having 2 trackers for one repository. On Wed, Feb 14, 2018 at 6:12 PM, Mike Walch <mwa...@apache.org> wrote: +1 I think it makes sense to try out GitHub before shutting off JIRA. This period could be limited to a month or two. On Wed, Feb 14, 2018 at 5:59 PM, Christopher <ctubb...@apache.org> wrote: What if we had an interim transition period, tentatively using GitHub to determine it's suitability for our workflows, and shut off JIRA later? On Wed, Feb 14, 2018 at 5:51 PM Josh Elser <els...@apache.org> wrote: I disagree with Mike in that I don't find JIRA to be so painful that it necessitates us changing, but I wouldn't block a move to GH issues if we turn off our JIRA use. On 2/14/18 4:29 PM, Mike Drob wrote: @josh - How do you feel about move from JIRA to GH Issues completely? On Wed, Feb 14, 2018 at 3:26 PM, Josh Elser <els...@apache.org> wrote: I believe I already stated -1 the last time this was brought up. Using two issue trackers is silly. On 2/14/18 3:30 PM, Mike Walch wrote: I want to enable GitHub issues for Accumulo's repos. This is not to replace JIRA but to give contributors more options for issue tracking. Unless there are objections, I will create an infra ticket this week.
Re: Additional options for issue tracking
I didn't ask for automated controls here -- human controls are fine. I have already said I am -1 on two concurrent issue trackers. If developers want to evaluate them, that's fine. On 2/14/18 10:12 PM, Christopher wrote: I don't think we have the ability to lock out non-committers from creating new GH issues if we enable them, nor do I think it would make sense to do so, since that's a valuable use case to consider during any trial period before shutting off JIRA. As for switching immediately to GH issues for non-primary repo (-website, -examples, -docker, etc...) I think that makes sense since those are already confusing when filed in the JIRA mixed in with the main repo's issues. On Wed, Feb 14, 2018 at 9:25 PM Josh Elser <els...@apache.org> wrote: I am OK with committers ONLY using GH issues on all repos (with clear guidance as to what the heck the project is doing) or doing a full-switch on the other repos. On 2/14/18 7:00 PM, Mike Miller wrote: We could do a trial period of GitHub issues for the accumulo sub-repos (accumulo-website, accumulo-examples...) then after a month or two decide to switch or not. That way we won't have duplicate issues or the confusion of having 2 trackers for one repository. On Wed, Feb 14, 2018 at 6:12 PM, Mike Walch <mwa...@apache.org> wrote: +1 I think it makes sense to try out GitHub before shutting off JIRA. This period could be limited to a month or two. On Wed, Feb 14, 2018 at 5:59 PM, Christopher <ctubb...@apache.org> wrote: What if we had an interim transition period, tentatively using GitHub to determine it's suitability for our workflows, and shut off JIRA later? On Wed, Feb 14, 2018 at 5:51 PM Josh Elser <els...@apache.org> wrote: I disagree with Mike in that I don't find JIRA to be so painful that it necessitates us changing, but I wouldn't block a move to GH issues if we turn off our JIRA use. On 2/14/18 4:29 PM, Mike Drob wrote: @josh - How do you feel about move from JIRA to GH Issues completely? On Wed, Feb 14, 2018 at 3:26 PM, Josh Elser <els...@apache.org> wrote: I believe I already stated -1 the last time this was brought up. Using two issue trackers is silly. On 2/14/18 3:30 PM, Mike Walch wrote: I want to enable GitHub issues for Accumulo's repos. This is not to replace JIRA but to give contributors more options for issue tracking. Unless there are objections, I will create an infra ticket this week.
Re: Additional options for issue tracking
I am OK with committers ONLY using GH issues on all repos (with clear guidance as to what the heck the project is doing) or doing a full-switch on the other repos. On 2/14/18 7:00 PM, Mike Miller wrote: We could do a trial period of GitHub issues for the accumulo sub-repos (accumulo-website, accumulo-examples...) then after a month or two decide to switch or not. That way we won't have duplicate issues or the confusion of having 2 trackers for one repository. On Wed, Feb 14, 2018 at 6:12 PM, Mike Walch <mwa...@apache.org> wrote: +1 I think it makes sense to try out GitHub before shutting off JIRA. This period could be limited to a month or two. On Wed, Feb 14, 2018 at 5:59 PM, Christopher <ctubb...@apache.org> wrote: What if we had an interim transition period, tentatively using GitHub to determine it's suitability for our workflows, and shut off JIRA later? On Wed, Feb 14, 2018 at 5:51 PM Josh Elser <els...@apache.org> wrote: I disagree with Mike in that I don't find JIRA to be so painful that it necessitates us changing, but I wouldn't block a move to GH issues if we turn off our JIRA use. On 2/14/18 4:29 PM, Mike Drob wrote: @josh - How do you feel about move from JIRA to GH Issues completely? On Wed, Feb 14, 2018 at 3:26 PM, Josh Elser <els...@apache.org> wrote: I believe I already stated -1 the last time this was brought up. Using two issue trackers is silly. On 2/14/18 3:30 PM, Mike Walch wrote: I want to enable GitHub issues for Accumulo's repos. This is not to replace JIRA but to give contributors more options for issue tracking. Unless there are objections, I will create an infra ticket this week.
Re: Additional options for issue tracking
I disagree with Mike in that I don't find JIRA to be so painful that it necessitates us changing, but I wouldn't block a move to GH issues if we turn off our JIRA use. On 2/14/18 4:29 PM, Mike Drob wrote: @josh - How do you feel about move from JIRA to GH Issues completely? On Wed, Feb 14, 2018 at 3:26 PM, Josh Elser <els...@apache.org> wrote: I believe I already stated -1 the last time this was brought up. Using two issue trackers is silly. On 2/14/18 3:30 PM, Mike Walch wrote: I want to enable GitHub issues for Accumulo's repos. This is not to replace JIRA but to give contributors more options for issue tracking. Unless there are objections, I will create an infra ticket this week.
Re: Additional options for issue tracking
I believe I already stated -1 the last time this was brought up. Using two issue trackers is silly. On 2/14/18 3:30 PM, Mike Walch wrote: I want to enable GitHub issues for Accumulo's repos. This is not to replace JIRA but to give contributors more options for issue tracking. Unless there are objections, I will create an infra ticket this week.
Re: [DISCUSS] Any interest in separate client/server tarballs
I think it would depend how much other "stuff" has to come in to support the *Clusters. I assumed it would be a bit, but, if it's not, I have no objections to a single jar. On 1/5/18 4:38 PM, Michael Wall wrote: Yeah, I was thinking more like your second paragraph. Thinking I would use the proposed client jar to develop against the MiniAccumuloCluster (typically the StandaloneMiniAccumuloCluster for me) and then deploy that code to run against a real cluster. Would like to flesh that usecase out a little more. Do you think it has to be another jar on top of the client jar? On Fri, Jan 5, 2018 at 4:31 PM Josh Elser <josh.el...@gmail.com> wrote: MAC, in its common state, is probably not something we'd want to include in this proposed tarball. The reasoning being that MAC (and related classes) aren't something that people would need on your "Hadoop Cluster" to talk to Accumulo. It's something that can just be obtained via Maven. However, if you're more referring to MAC as the generic "AccumuloCluster" interface (an attempt to make running tests against MAC and a real Accumulo cluster transparent -- StandaloneAccumuloCluster), then I could see some JAR that we'd include which would contain the necessary classes (on top of accumulo-client.jar) for users to run code seamlessly against a traditional MAC or the StandaloneAccumuloCluster. On 1/5/18 4:22 PM, Michael Wall wrote: I like the idea of a client jar that has less dependencies. Josh, where are thinking the MiniAccumuloCluster fits in here? On Fri, Jan 5, 2018 at 3:57 PM Christopher <ctubb...@apache.org> wrote: On Fri, Jan 5, 2018 at 10:30 AM Keith Turner <ke...@deenlo.com> wrote: On Thu, Jan 4, 2018 at 7:43 PM, Christopher <ctubb...@apache.org> wrote: tl;dr : I would prefer not to add another tarball as part of our "official" I am not opposed to replacing the current single tarball with client and server tarballs. What I find appealing about this is if the client tarball has less deps. However I think a lot of thought should be put into the scripts if this is done. For example the client tar and server tar should probably not both have accumulo commands that do different things. Agreed on Keith's point about the scripts and it requiring some consideration. releases, but I'd be in favor of a blog instructions, script, or build profile, which users could read/execute/activate to create a client-centric package. I've long believed that supporting different downstream packaging scenarios should be prioritized over upstream binary packaging. I have argued in These "downstream" packaging could be done within the Apache Accumulo project also. Like accumulo-docker. Creating other packaging projects within Accumulo is something to consider. +1; When I say "downstream", it's a role, not an entity. The point is that it's a distinct activity. accumulo-docker is a perfect example of a "downstream packaging" project maintained by the upstream community. I find it frustrating sometimes when supporting users that they can't tell the difference between what is "Accumulo" and what is "this specific packaging/configuration/deployment of Accumulo", because we don't make those lines clear. I think we can draw these lines a bit more clearly. favor of removing our current tarball entirely, while supporting efforts to Apache Accumulo needs some sort of tarball that makes it easy to run the code on a cluster, otherwise how can we test Accumulo on a cluster for releases? A binary tarball may be the best for this, but it's little more than the jars in Maven Central and a few text files. It could be trivially replaced with a simple script and manifest; it could also be replaced with an RPM, a docker image, or any number of things. A tarball is just one type of packaging for Accumulo's binaries. In any case, I wasn't talking about removing the ability to produce a binary tarball from source. Only removing it from our release artifacts and downloads. It is not a popular opinion, but I still think it's reasonable, with both pros and cons. enable downstream packaging by modularizing the server code, supporting a client-API jar (future work), and decoupling code from launch scripts. I think we should continue to do these kinds of improvements to support different packaging scenarios downstream, but I'd prefer to avoid additional "official" binary releases. I agree, I think if the Accumulo Java code made less assumptions about its runtime env it would result in code that is easier to maintain and package for different environments. In Fluo we have recently done a lot of work in order to support Docker, Mesos, and Kubernetes. This work has really cleaned up the core Fluo code making it easier to run in any environment. I suspect pulling the Accumuo tar ball into a separate
Re: [DISCUSS] Any interest in separate client/server tarballs
s. So it makes sense to discuss them at this point, but I don't think they should block work on two tarballs if that seems like a good idea. Agreed. That discussion can be deferred. Much depends on how it is to be split up. Rather than provide additional packages, I'd prefer to work with downstream to make the source more "packagable" to suit the needs of these downstream vendor/community packagers. One way we can do that here is by either documenting what would be needed in a client-centric package, or by providing a script or build profile to create it from source, so that your $dayjob or any other downstream packager doesn't have to figure that out from scratch. On Thu, Jan 4, 2018 at 7:17 PM Josh Elser <josh.el...@gmail.com> wrote: Hi, $dayjob presented me with a request to break up the current tarball into two: one suitable for "users" and another for the Accumulo services. The ultimate goal is to make upgrade scenarios a bit easier by having client and server centric packaging. The "client" tarball would be something suitable for most users providing the ability to do things like: * Launch a java app against Accumulo * Launch a MapReduce job against Accumulo * Launch the Accumulo shell Essentially, the client tarball is just a pared down version of our "current" tarball and the server-tarball is likely equivalent to our "current" tarball (given that we have little code which would be considered client-only). Obviously, there are many ways to go about this. If there is buy-in from other folks, adding some new assembly descriptors and making it a part of the Maven build (perhaps, optionally generated) would be the easiest in terms of maintenance. However, I don't want to push for that if it's just going to be ignored by folks. I'll be creating something to support this one way or another. Any thoughts/opinions? Would this have any value to other folks? - Josh
Re: [DISCUSS] Any interest in separate client/server tarballs
One thing worth mentioning is that I will be doing this against $dayjob's 1.7 based branch to start. If the consensus is to only do this for a 2.0 Accumulo release, perhaps I can use my work to seed that effort? I'm thinking something like a document that lists what would be in such a client-tarball. On 1/5/18 11:35 AM, Keith Turner wrote: On Fri, Jan 5, 2018 at 11:24 AM, Mike Walch <mwa...@apache.org> wrote: I like the idea of client tarball. I think it will make things easier for users. However, I agree with Keith that we are going to need to split the accumulo command into accumulo-client & accumulo-server. I am interested in helping out with this as I have done a lot of work on the scripts in 2.0. 2.0 would be a good time for disruptive script changes. Could call client script accumulo and server script accumulo-server. Just thinking the client script is used more often so shorter would be nice. On Thu, Jan 4, 2018 at 7:16 PM, Josh Elser <josh.el...@gmail.com> wrote: Hi, $dayjob presented me with a request to break up the current tarball into two: one suitable for "users" and another for the Accumulo services. The ultimate goal is to make upgrade scenarios a bit easier by having client and server centric packaging. The "client" tarball would be something suitable for most users providing the ability to do things like: * Launch a java app against Accumulo * Launch a MapReduce job against Accumulo * Launch the Accumulo shell Essentially, the client tarball is just a pared down version of our "current" tarball and the server-tarball is likely equivalent to our "current" tarball (given that we have little code which would be considered client-only). Obviously, there are many ways to go about this. If there is buy-in from other folks, adding some new assembly descriptors and making it a part of the Maven build (perhaps, optionally generated) would be the easiest in terms of maintenance. However, I don't want to push for that if it's just going to be ignored by folks. I'll be creating something to support this one way or another. Any thoughts/opinions? Would this have any value to other folks? - Josh
Re: [DISCUSS] Any interest in separate client/server tarballs
I'd be worried about advertising something that we're not treating as official as it would languish (unless we create tests that can validate the result for us). Thanks for the input. On 1/4/18 7:43 PM, Christopher wrote: tl;dr : I would prefer not to add another tarball as part of our "official" releases, but I'd be in favor of a blog instructions, script, or build profile, which users could read/execute/activate to create a client-centric package. I've long believed that supporting different downstream packaging scenarios should be prioritized over upstream binary packaging. I have argued in favor of removing our current tarball entirely, while supporting efforts to enable downstream packaging by modularizing the server code, supporting a client-API jar (future work), and decoupling code from launch scripts. I think we should continue to do these kinds of improvements to support different packaging scenarios downstream, but I'd prefer to avoid additional "official" binary releases. Rather than provide additional packages, I'd prefer to work with downstream to make the source more "packagable" to suit the needs of these downstream vendor/community packagers. One way we can do that here is by either documenting what would be needed in a client-centric package, or by providing a script or build profile to create it from source, so that your $dayjob or any other downstream packager doesn't have to figure that out from scratch. On Thu, Jan 4, 2018 at 7:17 PM Josh Elser <josh.el...@gmail.com> wrote: Hi, $dayjob presented me with a request to break up the current tarball into two: one suitable for "users" and another for the Accumulo services. The ultimate goal is to make upgrade scenarios a bit easier by having client and server centric packaging. The "client" tarball would be something suitable for most users providing the ability to do things like: * Launch a java app against Accumulo * Launch a MapReduce job against Accumulo * Launch the Accumulo shell Essentially, the client tarball is just a pared down version of our "current" tarball and the server-tarball is likely equivalent to our "current" tarball (given that we have little code which would be considered client-only). Obviously, there are many ways to go about this. If there is buy-in from other folks, adding some new assembly descriptors and making it a part of the Maven build (perhaps, optionally generated) would be the easiest in terms of maintenance. However, I don't want to push for that if it's just going to be ignored by folks. I'll be creating something to support this one way or another. Any thoughts/opinions? Would this have any value to other folks? - Josh
Re: [DISCUSS] Any interest in separate client/server tarballs
On 1/5/18 9:55 AM, Keith Turner wrote: Obviously, there are many ways to go about this. If there is buy-in from other folks, adding some new assembly descriptors and making it a part of the Maven build (perhaps, optionally generated) would be the easiest in terms of maintenance. However, I don't want to push for that if it's just going to be ignored by folks. I'll be creating something to support this one way or another. Do you have anything to share? I would be interested in reviewing this. Nothing yet. My plan is to take the stock bin-tarball, split the files up into two lists to make sure I have the separation correct (that things actually work). Then, I can implement it however we want. Any thoughts/opinions? Would this have any value to other folks? This is slightly unrelated, but it would be nice to lower the number of dependencies for the client side code and possibly shade in libthrift. Yup. Agreed.
[DISCUSS] Any interest in separate client/server tarballs
Hi, $dayjob presented me with a request to break up the current tarball into two: one suitable for "users" and another for the Accumulo services. The ultimate goal is to make upgrade scenarios a bit easier by having client and server centric packaging. The "client" tarball would be something suitable for most users providing the ability to do things like: * Launch a java app against Accumulo * Launch a MapReduce job against Accumulo * Launch the Accumulo shell Essentially, the client tarball is just a pared down version of our "current" tarball and the server-tarball is likely equivalent to our "current" tarball (given that we have little code which would be considered client-only). Obviously, there are many ways to go about this. If there is buy-in from other folks, adding some new assembly descriptors and making it a part of the Maven build (perhaps, optionally generated) would be the easiest in terms of maintenance. However, I don't want to push for that if it's just going to be ignored by folks. I'll be creating something to support this one way or another. Any thoughts/opinions? Would this have any value to other folks? - Josh
Re: Test replication
You can configure a replication peer which is the "local" Accumulo instance. I think there are some ITs which do this. On 1/4/18 4:13 PM, Mike Miller wrote: Trying to test a fix for the 2.0 Monitor https://issues.apache.org/jira/browse/ACCUMULO-4760 and I wanted to enable replication. Does anyone know if there is a way to enable it running a single Uno instance? I just need to "turn it on" so I can see if the Monitor is reporting correctly.
Re: [DISCUSS] Hadoop3 support target?
On 12/5/17 6:43 PM, Christopher wrote: I was wondering about Hadoop 3 shading and whether that would help us. It would be really nice if it could, or if there was some other class path solution that was easy. I think there are two major issues in this thread. The first is the API problems. The second is the Hadoop 3 support. They are related, but I think quickly dealing with the API issues can clarify what our options are for Hadoop 3. In the spirit of trying to keep these issues separate (I think Christopher is correct) https://github.com/apache/accumulo/pull/332 If we switch to using the new shaded jars from Hadoop, we can avoid coupling these issues at all. This comes with caveats as 3.0.0-beta1 is busted (https://issues.apache.org/jira/browse/HADOOP-15058). Building a 3.0.1-SNAPSHOT locally and using that let me run all of the unit tests which is promising. Going to kick off the ITs and see how they fare.
Re: [DISCUSS] Hadoop3 support target?
On 12/6/17 2:06 PM, Christopher wrote: On Wed, Dec 6, 2017 at 1:55 PM Keith Turner <ke...@deenlo.com> wrote: On Wed, Dec 6, 2017 at 1:43 PM, Josh Elser <els...@apache.org> wrote: On 12/6/17 12:17 PM, Keith Turner wrote: On Wed, Dec 6, 2017 at 11:56 AM, Josh Elser<els...@apache.org> wrote: Maybe a difference in interpretation: I was seeing 1a as being source-compatible still. My assumption was that "Deprecate ClientConfiguration" meant that it would remain in the codebase -- "replace" as in "replace expected user invocation", not removal of the old ClientConfiguration and addition of a new ClientConfig class. Ok, if we deprecate ClientConfiguration, leave it in 2.0, and drop the extends from ClientConfiguration in 2.0. Then I am not sure what the benefit of introducing the new ClientConfig type is? I read this as leaving the extends in ClientConfiguration and dropping that in the new ClientConfig. Agree, I wouldn't see the point in changing the parent class of ClientConfiguration (as that would break things). I don't think we can leave ClientConfiguration as deprecated and extending commons config in Accumulo 2.0. This leaves commons config 1 in the API. Personally I am not in favor of dropping ClientConfiguration in 2.0, which is why I was in favor option b. In the absence of any further input from others, I'll follow along with whatever you and Josh can agree on. Although I lean towards option 1.a, I don't feel strongly about either option. We can also do a vote if neither of you is able (or willing) to convince the other of your preference. I don't feel strongly enough either way to raise a stink. Color me surprised that Keith is the one to encourage quick removals from API :) If he's OK with it, I'm fine with it. I was trying to err on the side of less breakage.
Re: [DISCUSS] Hadoop3 support target?
On 12/6/17 12:17 PM, Keith Turner wrote: On Wed, Dec 6, 2017 at 11:56 AM, Josh Elser<els...@apache.org> wrote: Maybe a difference in interpretation: I was seeing 1a as being source-compatible still. My assumption was that "Deprecate ClientConfiguration" meant that it would remain in the codebase -- "replace" as in "replace expected user invocation", not removal of the old ClientConfiguration and addition of a new ClientConfig class. Ok, if we deprecate ClientConfiguration, leave it in 2.0, and drop the extends from ClientConfiguration in 2.0. Then I am not sure what the benefit of introducing the new ClientConfig type is? I read this as leaving the extends in ClientConfiguration and dropping that in the new ClientConfig. Agree, I wouldn't see the point in changing the parent class of ClientConfiguration (as that would break things).
Re: [DISCUSS] Hadoop3 support target?
Maybe a difference in interpretation: I was seeing 1a as being source-compatible still. My assumption was that "Deprecate ClientConfiguration" meant that it would remain in the codebase -- "replace" as in "replace expected user invocation", not removal of the old ClientConfiguration and addition of a new ClientConfig class. On 12/6/17 11:29 AM, Keith Turner wrote: On Wed, Dec 6, 2017 at 11:28 AM, Josh Elser <els...@apache.org> wrote: 1.a sounds better to me. why? A would be the ideal solution, I think B is the next best if A doesn't work. I need to get the Hadoop3 compatibility fixed, so I'll be investigating the Hadoop shaded artifacts this week. On 12/5/17 6:43 PM, Christopher wrote: I was wondering about Hadoop 3 shading and whether that would help us. It would be really nice if it could, or if there was some other class path solution that was easy. I think there are two major issues in this thread. The first is the API problems. The second is the Hadoop 3 support. They are related, but I think quickly dealing with the API issues can clarify what our options are for Hadoop 3. To fix the API, I would like to get consensus on proceeding with this path: 1. Rename 1.8.2-SNAPSHOT to 1.9.0-SNAPSHOT and deprecate the existing ZooKeeperInstance constructor which takes a Configuration a) Deprecate ClientConfiguration and replace with ClientConfig (or a better name) which does not extend Configuration or have API leak problems, and add a new ZKI constructor for this b) Ignore extends for now, and drop it from ClientConfiguration in 2.0 with a break (can't deprecate superclass), and add new ZKI constructor for more specific ClientConfiguration next to deprecated one 2. Drop deprecated stuff from 2.0 branch (and extends, if option 1.b was chosen) 3. Plan a 1.9.0 release instead of 1.8.2 I prefer 1.a over 1.b, personally, but I've been tossing back and forth. I would need input on which is best. There are pros and cons to both, regarding churn, and source and binary compatibility. Once we deal with the API, our options for Hadoop 3 become: A. Use Hadoop 3 shaded artifacts or some other class path solution (such as getting lucky identifying a version of commons-beanutils that works for both) B. Shade in 1.9 with a breaking change C. Create a 1.9 version named 2.0, so we can do a breaking change without semver violation; shade in this version D. Shade in the branch we're currently calling 2.0 I think we can defer that decision pending some further investigation/experimentation into what works, and deal with it after dealing with steps 1-3 above (but soon after, hopefully). On Tue, Dec 5, 2017 at 3:58 PM Josh Elser <els...@apache.org> wrote: Another potential suggestion I forgot about: we try to just move to the Hadoop shaded artifacts. This would invalidate the need to do more, but I have no idea how "battle-tested" those artifacts are. On 12/5/17 3:52 PM, Keith Turner wrote: If we do the following. * Drop ZooKeeperInstance.ZooKeeperInstance(Configuration config) method. * Drop extends from ClientConfig * Add a method ZooKeeperInstance.ZooKeeperInstance(ClientConfig config) Then this will not be binary compatible, so it will still be painful in many cases. It may be source compatible. For example the following will be source (but not binary) compatible. ClientConfiguration cc = new ClientConfiguration(file); //when compiled against older version of Accumulo will bind to method with commons config signature //when recompiled will bind to clientconfig version of method ZooKeeperInstance zki = new ZooKeeperInstance(cc); The following would not be source or binary compatible. Configuration cc = new ClientConfiguration(file); ZooKeeperInstance zki = new ZooKeeperInstance(cc); On Tue, Dec 5, 2017 at 3:40 PM, Josh Elser <els...@apache.org> wrote: On 12/5/17 3:28 PM, Keith Turner wrote: On Tue, Dec 5, 2017 at 2:53 PM, Josh Elser<els...@apache.org> wrote: Interesting. What makes you want to deprecate ClientConfig entirely? I'd be worried about removing without sufficient thought of replacement around. It would be a bit "churn-y" to introduce yet another way that clients have to connect (since it was introduced in 1.6-ish?). Working around the ClientConfig changes was irritating for the downstream integrations (Hive, most notably). Ok maybe thats a bad idea, not looking to cause pain. Here were some of my goals. * Remove commons config from API completely via deprecation cycle. * Introduce API that supports putting all props needed to connect to Accumulo in an API. I suppose if we want to keep ClientConfig class in API, then there is no way to remove commons config via a deprecation cycle?? We can't deprecate the extension of commons config, all we can do is just drop it at some point. My line of thinking is
Re: [DISCUSS] Hadoop3 support target?
1.a sounds better to me. A would be the ideal solution, I think B is the next best if A doesn't work. I need to get the Hadoop3 compatibility fixed, so I'll be investigating the Hadoop shaded artifacts this week. On 12/5/17 6:43 PM, Christopher wrote: I was wondering about Hadoop 3 shading and whether that would help us. It would be really nice if it could, or if there was some other class path solution that was easy. I think there are two major issues in this thread. The first is the API problems. The second is the Hadoop 3 support. They are related, but I think quickly dealing with the API issues can clarify what our options are for Hadoop 3. To fix the API, I would like to get consensus on proceeding with this path: 1. Rename 1.8.2-SNAPSHOT to 1.9.0-SNAPSHOT and deprecate the existing ZooKeeperInstance constructor which takes a Configuration a) Deprecate ClientConfiguration and replace with ClientConfig (or a better name) which does not extend Configuration or have API leak problems, and add a new ZKI constructor for this b) Ignore extends for now, and drop it from ClientConfiguration in 2.0 with a break (can't deprecate superclass), and add new ZKI constructor for more specific ClientConfiguration next to deprecated one 2. Drop deprecated stuff from 2.0 branch (and extends, if option 1.b was chosen) 3. Plan a 1.9.0 release instead of 1.8.2 I prefer 1.a over 1.b, personally, but I've been tossing back and forth. I would need input on which is best. There are pros and cons to both, regarding churn, and source and binary compatibility. Once we deal with the API, our options for Hadoop 3 become: A. Use Hadoop 3 shaded artifacts or some other class path solution (such as getting lucky identifying a version of commons-beanutils that works for both) B. Shade in 1.9 with a breaking change C. Create a 1.9 version named 2.0, so we can do a breaking change without semver violation; shade in this version D. Shade in the branch we're currently calling 2.0 I think we can defer that decision pending some further investigation/experimentation into what works, and deal with it after dealing with steps 1-3 above (but soon after, hopefully). On Tue, Dec 5, 2017 at 3:58 PM Josh Elser <els...@apache.org> wrote: Another potential suggestion I forgot about: we try to just move to the Hadoop shaded artifacts. This would invalidate the need to do more, but I have no idea how "battle-tested" those artifacts are. On 12/5/17 3:52 PM, Keith Turner wrote: If we do the following. * Drop ZooKeeperInstance.ZooKeeperInstance(Configuration config) method. * Drop extends from ClientConfig * Add a method ZooKeeperInstance.ZooKeeperInstance(ClientConfig config) Then this will not be binary compatible, so it will still be painful in many cases. It may be source compatible. For example the following will be source (but not binary) compatible. ClientConfiguration cc = new ClientConfiguration(file); //when compiled against older version of Accumulo will bind to method with commons config signature //when recompiled will bind to clientconfig version of method ZooKeeperInstance zki = new ZooKeeperInstance(cc); The following would not be source or binary compatible. Configuration cc = new ClientConfiguration(file); ZooKeeperInstance zki = new ZooKeeperInstance(cc); On Tue, Dec 5, 2017 at 3:40 PM, Josh Elser <els...@apache.org> wrote: On 12/5/17 3:28 PM, Keith Turner wrote: On Tue, Dec 5, 2017 at 2:53 PM, Josh Elser<els...@apache.org> wrote: Interesting. What makes you want to deprecate ClientConfig entirely? I'd be worried about removing without sufficient thought of replacement around. It would be a bit "churn-y" to introduce yet another way that clients have to connect (since it was introduced in 1.6-ish?). Working around the ClientConfig changes was irritating for the downstream integrations (Hive, most notably). Ok maybe thats a bad idea, not looking to cause pain. Here were some of my goals. * Remove commons config from API completely via deprecation cycle. * Introduce API that supports putting all props needed to connect to Accumulo in an API. I suppose if we want to keep ClientConfig class in API, then there is no way to remove commons config via a deprecation cycle?? We can't deprecate the extension of commons config, all we can do is just drop it at some point. My line of thinking is that the majority of the time, we're creating a ClientConfiguration by one of: * ClientConfiguration#loadDefault() * new ClientConfiguration(String) * new ClientConfiguration(File) Granted, we also inherit/expose a few other things (notably extending CompositeConfiguration and throwing ConfigurationException). I would be comfortable with dropping those w/o deprecation. I have not seen evidence from anyone that they are widely in use by folks (although I've not explicitly asked, either).
Re: [DISCUSS] Hadoop3 support target?
Another potential suggestion I forgot about: we try to just move to the Hadoop shaded artifacts. This would invalidate the need to do more, but I have no idea how "battle-tested" those artifacts are. On 12/5/17 3:52 PM, Keith Turner wrote: If we do the following. * Drop ZooKeeperInstance.ZooKeeperInstance(Configuration config) method. * Drop extends from ClientConfig * Add a method ZooKeeperInstance.ZooKeeperInstance(ClientConfig config) Then this will not be binary compatible, so it will still be painful in many cases. It may be source compatible. For example the following will be source (but not binary) compatible. ClientConfiguration cc = new ClientConfiguration(file); //when compiled against older version of Accumulo will bind to method with commons config signature //when recompiled will bind to clientconfig version of method ZooKeeperInstance zki = new ZooKeeperInstance(cc); The following would not be source or binary compatible. Configuration cc = new ClientConfiguration(file); ZooKeeperInstance zki = new ZooKeeperInstance(cc); On Tue, Dec 5, 2017 at 3:40 PM, Josh Elser <els...@apache.org> wrote: On 12/5/17 3:28 PM, Keith Turner wrote: On Tue, Dec 5, 2017 at 2:53 PM, Josh Elser<els...@apache.org> wrote: Interesting. What makes you want to deprecate ClientConfig entirely? I'd be worried about removing without sufficient thought of replacement around. It would be a bit "churn-y" to introduce yet another way that clients have to connect (since it was introduced in 1.6-ish?). Working around the ClientConfig changes was irritating for the downstream integrations (Hive, most notably). Ok maybe thats a bad idea, not looking to cause pain. Here were some of my goals. * Remove commons config from API completely via deprecation cycle. * Introduce API that supports putting all props needed to connect to Accumulo in an API. I suppose if we want to keep ClientConfig class in API, then there is no way to remove commons config via a deprecation cycle?? We can't deprecate the extension of commons config, all we can do is just drop it at some point. My line of thinking is that the majority of the time, we're creating a ClientConfiguration by one of: * ClientConfiguration#loadDefault() * new ClientConfiguration(String) * new ClientConfiguration(File) Granted, we also inherit/expose a few other things (notably extending CompositeConfiguration and throwing ConfigurationException). I would be comfortable with dropping those w/o deprecation. I have not seen evidence from anyone that they are widely in use by folks (although I've not explicitly asked, either).
Re: [DISCUSS] Hadoop3 support target?
On 12/5/17 3:28 PM, Keith Turner wrote: On Tue, Dec 5, 2017 at 2:53 PM, Josh Elser<els...@apache.org> wrote: Interesting. What makes you want to deprecate ClientConfig entirely? I'd be worried about removing without sufficient thought of replacement around. It would be a bit "churn-y" to introduce yet another way that clients have to connect (since it was introduced in 1.6-ish?). Working around the ClientConfig changes was irritating for the downstream integrations (Hive, most notably). Ok maybe thats a bad idea, not looking to cause pain. Here were some of my goals. * Remove commons config from API completely via deprecation cycle. * Introduce API that supports putting all props needed to connect to Accumulo in an API. I suppose if we want to keep ClientConfig class in API, then there is no way to remove commons config via a deprecation cycle?? We can't deprecate the extension of commons config, all we can do is just drop it at some point. My line of thinking is that the majority of the time, we're creating a ClientConfiguration by one of: * ClientConfiguration#loadDefault() * new ClientConfiguration(String) * new ClientConfiguration(File) Granted, we also inherit/expose a few other things (notably extending CompositeConfiguration and throwing ConfigurationException). I would be comfortable with dropping those w/o deprecation. I have not seen evidence from anyone that they are widely in use by folks (although I've not explicitly asked, either).
Re: [DISCUSS] Hadoop3 support target?
Interesting. What makes you want to deprecate ClientConfig entirely? I'd be worried about removing without sufficient thought of replacement around. It would be a bit "churn-y" to introduce yet another way that clients have to connect (since it was introduced in 1.6-ish?). Working around the ClientConfig changes was irritating for the downstream integrations (Hive, most notably). On 12/5/17 1:13 PM, Keith Turner wrote: I was thinking of a slightly different path forward. * Add new entry point and deprecate clientconfig in 1.9 * Branch 1.9 off 1.8 * Stop releasing 1.8.x in favor of 1.9.x (they are the same except for new API) * Release 1.9 ASAP * Drop clientconfig in 2.0.0 * Release 2.0.0 early next year... maybe target March On Tue, Dec 5, 2017 at 12:51 PM, Josh Elser <els...@apache.org> wrote: Ok, a bridge version seems to be a general path forward. Generally this would be... * 1.8 gets relevant commons-config classes/methods deprecated * 1.9 is 1.8 with those deprecation points removed * 1.9 has commons-config shaded (maybe?) IMO, it's critical that we remove the commons-config stuff from our public API (shame this somehow was let in to begin). I think shading our use of commons-config would be a good idea and lessen our ClientConfiguration scope to being able to read from a file. Trying to support the breadth of what commons-configuration can do will just get us into more trouble. On 12/5/17 12:18 PM, Keith Turner wrote: If we are going to deprecate, then it would be nice to have a replacement. One thing that has irked me about the current Accumulo entry point is that one can not specify everything needed to connect to in a single props file. Specifically, credentials can not be specified. It would be really nice to have a new entry point that allows this. We could release a 1.9 bridge version. This version would be based on 1.8 and only include a new entry point. Base it on 1.8 in order to allow a low risk upgrade for anyone currently using 1.8. Once people start using 1.9 they can have code that uses the old and new entry point running at the same time. In 2.0 we can drop the problematic entry point. Below is a commit to 1.8 where I was experimenting with a new entry point. https://github.com/keith-turner/accumulo/commit/1c07fa62e9c57bde7e60907595d50f898d03c9d5 This new API would need review, its rough and there are some things I don't like about it. Just sharing for discussion of general concept, not advocating for this specific API. On Mon, Dec 4, 2017 at 6:27 PM, Dave Marion <dmario...@gmail.com> wrote: There is no reason that you can't mark the offending API methods as deprecated in a 1.8.x release, then immediately branch off of that to create a 2.0 and remove the method. Alternatively, we could decide to forego the semver rules for a specific release and make sure to point it out in the release notes. -Original Message- From: Josh Elser [mailto:els...@apache.org] Sent: Monday, December 4, 2017 6:19 PM To: dev@accumulo.apache.org Subject: Re: [DISCUSS] Hadoop3 support target? Also, just to be clear for everyone else: This means that we have *no roadmap* at all for Hadoop 3 support because Accumulo 2.0 is in a state of languish. This is a severe enough problem to me that I would consider breaking API compatibility and fixing the API leak in 1.7/1.8. I'm curious what people other than Christopher think (assuming from his comments/JIRA work that he disagrees with me). On 12/4/17 6:12 PM, Christopher wrote: Agreed. On Mon, Dec 4, 2017 at 6:01 PM Josh Elser <els...@apache.org> wrote: Ah, I'm seeing now -- didn't check my inbox appropriately. I think the fact that code that we don't own has somehow been allowed to be public API is the smell. That's something that needs to be rectified sooner than later. By that measure, it can *only* land on Accumulo 2.0 (which is going to be a major issue for the project). On 12/4/17 5:58 PM, Josh Elser wrote: Sorry, I don't follow. Why do you think 4611/4753 is a show-stopper? Cuz, uh... I made it work already :) Thanks for the JIRA cleanup. Forgot about that one. On 12/4/17 5:55 PM, Christopher wrote: I don't think we can support it with 1.8 or earlier, because of some serious incompatibilities (namely, ACCUMULO-4611/4753) I think people are still patching 1.7, so I don't think we've "officially" EOL'd it. I think 2.0 could require Hadoop 3, if Hadoop 3 is sufficiently stable. On Mon, Dec 4, 2017 at 1:14 PM Josh Elser <els...@apache.org> wrote: What branch do we want to consider Hadoop3 support? There is a 3.0.0-beta1 release that's been out for a while, and Hadoop PMC has already done a 3.0.0 RC0. I think it's the right time to start considering this. In my poking so far, I've filed ACCUMULO-4753 which I'm working through now. This does raise the question: where do we want to say we support Hadoop3? 1.8 or 2.0? (have we "officially" de
Re: [DISCUSS] Hadoop3 support target?
Ok, a bridge version seems to be a general path forward. Generally this would be... * 1.8 gets relevant commons-config classes/methods deprecated * 1.9 is 1.8 with those deprecation points removed * 1.9 has commons-config shaded (maybe?) IMO, it's critical that we remove the commons-config stuff from our public API (shame this somehow was let in to begin). I think shading our use of commons-config would be a good idea and lessen our ClientConfiguration scope to being able to read from a file. Trying to support the breadth of what commons-configuration can do will just get us into more trouble. On 12/5/17 12:18 PM, Keith Turner wrote: If we are going to deprecate, then it would be nice to have a replacement. One thing that has irked me about the current Accumulo entry point is that one can not specify everything needed to connect to in a single props file. Specifically, credentials can not be specified. It would be really nice to have a new entry point that allows this. We could release a 1.9 bridge version. This version would be based on 1.8 and only include a new entry point. Base it on 1.8 in order to allow a low risk upgrade for anyone currently using 1.8. Once people start using 1.9 they can have code that uses the old and new entry point running at the same time. In 2.0 we can drop the problematic entry point. Below is a commit to 1.8 where I was experimenting with a new entry point. https://github.com/keith-turner/accumulo/commit/1c07fa62e9c57bde7e60907595d50f898d03c9d5 This new API would need review, its rough and there are some things I don't like about it. Just sharing for discussion of general concept, not advocating for this specific API. On Mon, Dec 4, 2017 at 6:27 PM, Dave Marion <dmario...@gmail.com> wrote: There is no reason that you can't mark the offending API methods as deprecated in a 1.8.x release, then immediately branch off of that to create a 2.0 and remove the method. Alternatively, we could decide to forego the semver rules for a specific release and make sure to point it out in the release notes. -Original Message- From: Josh Elser [mailto:els...@apache.org] Sent: Monday, December 4, 2017 6:19 PM To: dev@accumulo.apache.org Subject: Re: [DISCUSS] Hadoop3 support target? Also, just to be clear for everyone else: This means that we have *no roadmap* at all for Hadoop 3 support because Accumulo 2.0 is in a state of languish. This is a severe enough problem to me that I would consider breaking API compatibility and fixing the API leak in 1.7/1.8. I'm curious what people other than Christopher think (assuming from his comments/JIRA work that he disagrees with me). On 12/4/17 6:12 PM, Christopher wrote: Agreed. On Mon, Dec 4, 2017 at 6:01 PM Josh Elser <els...@apache.org> wrote: Ah, I'm seeing now -- didn't check my inbox appropriately. I think the fact that code that we don't own has somehow been allowed to be public API is the smell. That's something that needs to be rectified sooner than later. By that measure, it can *only* land on Accumulo 2.0 (which is going to be a major issue for the project). On 12/4/17 5:58 PM, Josh Elser wrote: Sorry, I don't follow. Why do you think 4611/4753 is a show-stopper? Cuz, uh... I made it work already :) Thanks for the JIRA cleanup. Forgot about that one. On 12/4/17 5:55 PM, Christopher wrote: I don't think we can support it with 1.8 or earlier, because of some serious incompatibilities (namely, ACCUMULO-4611/4753) I think people are still patching 1.7, so I don't think we've "officially" EOL'd it. I think 2.0 could require Hadoop 3, if Hadoop 3 is sufficiently stable. On Mon, Dec 4, 2017 at 1:14 PM Josh Elser <els...@apache.org> wrote: What branch do we want to consider Hadoop3 support? There is a 3.0.0-beta1 release that's been out for a while, and Hadoop PMC has already done a 3.0.0 RC0. I think it's the right time to start considering this. In my poking so far, I've filed ACCUMULO-4753 which I'm working through now. This does raise the question: where do we want to say we support Hadoop3? 1.8 or 2.0? (have we "officially" deprecated 1.7?) - Josh https://issues.apache.org/jira/browse/ACCUMULO-4753
Re: [DISCUSS] Hadoop3 support target?
Also, just to be clear for everyone else: This means that we have *no roadmap* at all for Hadoop 3 support because Accumulo 2.0 is in a state of languish. This is a severe enough problem to me that I would consider breaking API compatibility and fixing the API leak in 1.7/1.8. I'm curious what people other than Christopher think (assuming from his comments/JIRA work that he disagrees with me). On 12/4/17 6:12 PM, Christopher wrote: Agreed. On Mon, Dec 4, 2017 at 6:01 PM Josh Elser <els...@apache.org> wrote: Ah, I'm seeing now -- didn't check my inbox appropriately. I think the fact that code that we don't own has somehow been allowed to be public API is the smell. That's something that needs to be rectified sooner than later. By that measure, it can *only* land on Accumulo 2.0 (which is going to be a major issue for the project). On 12/4/17 5:58 PM, Josh Elser wrote: Sorry, I don't follow. Why do you think 4611/4753 is a show-stopper? Cuz, uh... I made it work already :) Thanks for the JIRA cleanup. Forgot about that one. On 12/4/17 5:55 PM, Christopher wrote: I don't think we can support it with 1.8 or earlier, because of some serious incompatibilities (namely, ACCUMULO-4611/4753) I think people are still patching 1.7, so I don't think we've "officially" EOL'd it. I think 2.0 could require Hadoop 3, if Hadoop 3 is sufficiently stable. On Mon, Dec 4, 2017 at 1:14 PM Josh Elser <els...@apache.org> wrote: What branch do we want to consider Hadoop3 support? There is a 3.0.0-beta1 release that's been out for a while, and Hadoop PMC has already done a 3.0.0 RC0. I think it's the right time to start considering this. In my poking so far, I've filed ACCUMULO-4753 which I'm working through now. This does raise the question: where do we want to say we support Hadoop3? 1.8 or 2.0? (have we "officially" deprecated 1.7?) - Josh https://issues.apache.org/jira/browse/ACCUMULO-4753
Re: [DISCUSS] Hadoop3 support target?
Ah, I'm seeing now -- didn't check my inbox appropriately. I think the fact that code that we don't own has somehow been allowed to be public API is the smell. That's something that needs to be rectified sooner than later. By that measure, it can *only* land on Accumulo 2.0 (which is going to be a major issue for the project). On 12/4/17 5:58 PM, Josh Elser wrote: Sorry, I don't follow. Why do you think 4611/4753 is a show-stopper? Cuz, uh... I made it work already :) Thanks for the JIRA cleanup. Forgot about that one. On 12/4/17 5:55 PM, Christopher wrote: I don't think we can support it with 1.8 or earlier, because of some serious incompatibilities (namely, ACCUMULO-4611/4753) I think people are still patching 1.7, so I don't think we've "officially" EOL'd it. I think 2.0 could require Hadoop 3, if Hadoop 3 is sufficiently stable. On Mon, Dec 4, 2017 at 1:14 PM Josh Elser <els...@apache.org> wrote: What branch do we want to consider Hadoop3 support? There is a 3.0.0-beta1 release that's been out for a while, and Hadoop PMC has already done a 3.0.0 RC0. I think it's the right time to start considering this. In my poking so far, I've filed ACCUMULO-4753 which I'm working through now. This does raise the question: where do we want to say we support Hadoop3? 1.8 or 2.0? (have we "officially" deprecated 1.7?) - Josh https://issues.apache.org/jira/browse/ACCUMULO-4753
Re: [DISCUSS] Hadoop3 support target?
Sorry, I don't follow. Why do you think 4611/4753 is a show-stopper? Cuz, uh... I made it work already :) Thanks for the JIRA cleanup. Forgot about that one. On 12/4/17 5:55 PM, Christopher wrote: I don't think we can support it with 1.8 or earlier, because of some serious incompatibilities (namely, ACCUMULO-4611/4753) I think people are still patching 1.7, so I don't think we've "officially" EOL'd it. I think 2.0 could require Hadoop 3, if Hadoop 3 is sufficiently stable. On Mon, Dec 4, 2017 at 1:14 PM Josh Elser <els...@apache.org> wrote: What branch do we want to consider Hadoop3 support? There is a 3.0.0-beta1 release that's been out for a while, and Hadoop PMC has already done a 3.0.0 RC0. I think it's the right time to start considering this. In my poking so far, I've filed ACCUMULO-4753 which I'm working through now. This does raise the question: where do we want to say we support Hadoop3? 1.8 or 2.0? (have we "officially" deprecated 1.7?) - Josh https://issues.apache.org/jira/browse/ACCUMULO-4753
[DISCUSS] Hadoop3 support target?
What branch do we want to consider Hadoop3 support? There is a 3.0.0-beta1 release that's been out for a while, and Hadoop PMC has already done a 3.0.0 RC0. I think it's the right time to start considering this. In my poking so far, I've filed ACCUMULO-4753 which I'm working through now. This does raise the question: where do we want to say we support Hadoop3? 1.8 or 2.0? (have we "officially" deprecated 1.7?) - Josh https://issues.apache.org/jira/browse/ACCUMULO-4753
Re: [DISCUSS] Moving away from Thrift
On 11/17/17 10:32 AM, Christopher wrote: On Fri, Nov 17, 2017 at 8:21 AM Josh Elser<els...@apache.org> wrote: Did you offer to make the release? See me with commons-vfs a time back. The current issue with Thrift is not the point. The problems we've encountered with Thrift were was provided as background context only. I seriously think you are avoiding all of the good that Thrift provides us for the sake of a platform to discuss your distaste. Take a look at the amount of code that makes up Hadoop's or HBase's RPC implementations and the corresponding (often nasty) bugs that have come up over the years. There are numerous things which Thrift continues to do very well that have never become problems for us in Accumulo. Having seen the other side of the fence, I would happily take Thrift (warts and all) any day over the alternatives. Your proposal seems to me like you're blowing the situation out of proportion. I haven't proposed we do anything beyond "consider" or "discuss". I don't think "consider" or "discuss" are "out of proportion", even if Thrift had zero problems. /me blinks. Ok then.
Re: [DISCUSS] Moving away from Thrift
Did you offer to make the release? See me with commons-vfs a time back. Your proposal seems to me like you're blowing the situation out of proportion. On Nov 16, 2017 23:58, "Christopher"wrote: > The current Thrift issue has already been fixed with a patch. Their PMC > needs to release it, though. > > Following ASF's commitment to "community over code", I think it would be > inappropriate for an Apache project to fork another active project while > that community still exists. It's better to work with them if we can, and > to use another dependency if we can't. There may be ASF policy against such > forking, but that may only apply to forking non-ASF projects. In any case, > I don't think it's a good idea. > > Also, even if we are able to resolve the current issue of releasing a > version without the spammy print statement, I think there's value in > discussing possible alternatives and their pros/cons. There's no timeline > for this. Consider this an open-ended discussion regarding RPC > alternatives. I just want to gather those alternatives into one place to > discuss. > > > On Thu, Nov 16, 2017 at 11:43 PM Ed Coleman wrote: > > > Have we tried fixing the current issue and then submitting a > pull-request? > > > > I'd favor first submitting a pull request and any other help that we can > > provide to get it adopted and released soon - failing that we could fork > > the project and go from there. That could offer us a path to correct the > > immediate issue and offer time to consider other alternatives. > > > > Ed Coleman > > > > -Original Message- > > From: Christopher [mailto:ctubb...@apache.org] > > Sent: Thursday, November 16, 2017 11:36 PM > > To: accumulo-dev > > Subject: [DISCUSS] Moving away from Thrift > > > > Accumulo Devs, > > > > I think it's time we start seriously thinking about moving away from > > Thrift and considering alternatives. > > For me, https://issues.apache.org/jira/browse/THRIFT-4062 is becoming > the > > last straw. > > > > Thrift is a neat idea, but to be blunt: there seems to be a fundamental > > lack of care or interest from the Thrift developers at the current > moment. > > > > Some of the problems we've seen over the years: Every version is > > fundamentally incompatible with other versions. Repeated flip-flopping > > regressions seems to occur with each release. Fundamental design concepts > > like distinguishing server-side exceptions (TApplicationException vs. > > TException) are undermined without consideration of the initial design. > > And now, a serious bug (a spammy debugging print statement) was left in > for > > nearly a year now (still exists in current version), and no response from > > the PMC to indicate any willingness to release a fix. Repeated requests > to > > the developer list has gone ignored. And, I'm not even counting my > requests > > for assistance debugging a compiler issue on s390x arch having also gone > > ignored. > > > > These problems are not exclusive to Accumulo. Many of these are problems > > that Cassandra has also faced, and I'm sure there are others. > > > > It's possible that Thrift can remedy the situation. None of these > problems > > are insurmountable, and none of them are beyond fixes, particularly if we > > can afford to volunteer more to help out. My intention is not to throw a > > fellow Apache project under the bus, and I do not intend to give up > > reporting bugs, and contributing patches to Thrift where appropriate. > But, > > I think we also need to think realistically, and consider alternatives, > if > > Thrift development does not go in a direction which is favorable to > > Accumulo. > > > > So, with that in mind, any suggestions for alternatives? With pros/cons? > > > > >
Re: review board
Hey Mark, Yup, we're still a CTR project. That should be captured on the website on our governance page and would require a VOTE by the PMC to change. We don't have any enforced means of mechanism to perform reviews. We used to use Reviewboard a bit, but, as of late, more happens on Github with the better integration that Infra has provided. For example, you'll find that some projects expressly state certain systems as the ones that must be used for code-review. It's not been an issue in Accumulo. Re: CTR in practice, we do still have a bit of review happening before commit -- it's up to the discretion of the committer. If it's not a trivial change, you'll likely see the committer waiting for someone else to take a look before pushing it. Low-volume and decent test coverage helps make this a tenable process. On 11/1/17 12:28 PM, J. Mark Owens wrote: Hi, I'm going through a lot of the Accumulo documentation as I look at ACCUMULO-4714 and had a question about some of the information. Is the review board documentation page still up to date and accurate? I clicked the instance link (https://reviews.apache.org/ ) and noticed that the last entry for Accumulo is over a year old. Is this something that is still actively utilized or should the information be revised in some manner? Is Accumulo still using a Commit-Then-Review policy, etc? Thanks, Mark
Re: KerberosToken hell
Re #1: You don't actually need to do this unless you've disallowed anonymous connections to Zookeeper. Anonymous access to ZK is sufficient for Accumulo clients. Have you made any effort to find existing code in the Accumulo repository? For example [1]. The KerberosToken is nothing other than a thin object which is ultimately stating that Kerberos credentials are intended to be used for authentication. Accumulo provides no API for the acquisition or local storage of those credentials -- thus, it's not suitable that Accumulo provides API to do this. [1] https://github.com/apache/accumulo/blob/f81a8ec7410e789d11941351d5899b8894c6a322/test/src/main/java/org/apache/accumulo/test/functional/KerberosIT.java#L158-L177 On 10/27/17 1:58 PM, Jorge Machado wrote: So how is the best way to Get an accumulo connector if the cluster is kerberized ? I have done the following: 1- add a jaas.conf for zookeeper 2 create an instance (that logs in via sasl into zookeeper) 3- generate an AuthToken from KerberosToken class which logs the user in the ugi in but keeps the object on KerberosToken class 3 UserGroupInformation.loginwithkeythat(...,...) - this is needed because the thrift Client server just get’s the user from ugi but it is not there(because KerberosToken keeps the state) 4 - get the connector and passing the token Would be nicer that we let the state on the ugi instead of the KerberosToken We could create a public method from KerberosToken that logs the user in via ugi what you mean with side effects ? Jorge Machado jo...@jmachado.me<mailto:jo...@jmachado.me> Am 27.10.2017 um 18:31 schrieb Josh Elser <els...@apache.org<mailto:els...@apache.org>>: Nearly all components in the Hadoop ecosystem require you to perform a login with your credentials when writing Java code. The only exception I'm aware of is ZooKeeper which can automatically perform a login via JAAS. Supporting automatic login via JAAS would be the best path forward here. Creating unique side-effects around security credentials in Accumulo is a bad idea (which is why the method you're referring to on KerberosToken was marked as Deprecated so that we eventually remove it). On 10/27/17 12:06 PM, Jorge Machado wrote: Hi Guys, I just started developing a Accumulo Client with Kerberos and sasl. It was a hell to figure out that you need to call yourself UserGroupInformation.loginfromkeytab(principal,keytab) and then you can call KerberosToken(principal,keytab) this all because we deprecated the replaceuser from the ugi. Later on when we get the connector this breaks apart mainly because for example my keytab is has not the same user as the os account where I’m developing. It would be nice to just login the user. What are the are you guys thinking about this ? Regards Jorge
Re: Unable to drop the table
Please inspect the Accumulo Master log running on the host identified by the IP address in the warning message. Look for any Exceptions or ERROR messages reported in that log file. On 10/23/17 12:57 PM, raviteja@gmail.com wrote: I am getting an error whenever I try to drop the table. [impl.ThriftTransportPool] WARN : Thread "shell" stuck on IO to ip-Xinternal: (0) for at least 120038 ms -- Sent from: http://apache-accumulo.1065345.n5.nabble.com/Developers-f3.html
Re: Draft Board Report for Oct 2017
Need to strike the "-Description goes here-". Otherwise, pretty dry report, but I guess there really wasn't much to report either. On 10/9/17 9:58 AM, Michael Wall wrote: The Apache Accumulo PMC decided to draft its quarterly board reports on the dev list. Here is a draft of our report which is due by Wednesday, Oct 11. Please let me know if you have any suggestions, I plan to submit on the 11th. Mike -- ## Description: - Description goes here- The Apache Accumulo sorted, distributed key/value store is a robust, scalable, high performance data storage system that features cell-based access control and customizable server-side processing. It is based on Google's BigTable design and is built on top of Apache Hadoop, Zookeeper, and Thrift. ## Issues: - There are no issues requiring board attention at this time. ## Activity: - There were no new releases during the current reporting period. - The 4th annual Accumulo Summit will be held on Oct 16th in Columbia, MD. The PMC has approved the use of the Apache Accumulo trademark. ## Health report: - The project remains healthy. Activity levels on mailing lists, git and JIRA remain constant. ## PMC changes: - Currently 30 PMC members. - Ivan Bella was added to the PMC on Tue Jul 11 2017 ## Committer base changes: - Currently 30 committers. - Ivan Bella was added as a committer on Wed Jul 12 2017 ## Releases: - Last release was 1.7.3 on Sat Mar 25 2017 ## Mailing list activity: - Nothing significant in the figures ## JIRA activity: - 43 JIRA tickets created in the last 3 months - 36 JIRA tickets closed/resolved in the last 3 months
Re: [DISCUSS] Guava Dependencies
On 9/18/17 2:12 PM, Mike Miller wrote: Recently tickets have been opened dealing with Guava in Accumulo (see ACCUMULO-4701 through 4704), in particular the use of Beta classes and methods. Use of Guava comes with a few warnings... From the Guava README: *1. APIs marked with the @Beta annotation at the class or method level are subject to change. They can be modified in any way, or even removed, at any time. If your code is a library itself (i.e. it is used on the CLASSPATH of users outside your own control), you should not use beta APIs, unless you repackage them (e.g. using ProGuard).2.Deprecated non-beta APIs will be removed two years after the release in which they are first deprecated. You must fix your references before this time. If you don't, any manner of breakage could result (you are not guaranteed a compilation error).* I think it is worth a discussion on how to handle Guava dependencies going forward across the different versions of Accumulo. The goal would be to allow use of a newer version version of Guava in client applications with the current supported versions of Accumulo. Ideally, we could just eliminate any use of Beta Guava code. But there are Beta classes that are very useful and some which we already have integrated into released Accumulo versions. There seem to be 3 ways to handle Guava dependencies: 1 - jar shading +1 (favoring #3 too, when not intrusive). We stop "advertising" that we include Guava on the classpath and it's no longer our problem. As the other part of the thread alludes, if Hadoop brings in a version, fine. Accumulo specifically should stop trying to rely on something specific coming down from its dependencies and "control its own destiny". FWIW, HBase has been actively moving to this model and, IMO, it's been working well. 2 - copy Guava code into Accumulo 3 - replace Guava code with standard Java We may have to handle it differently with each version of Accumulo. For example, 1.8 has more widespread use of Beta annotated code than 1.7.
Re: [DISCUSS] 1.8.2
Given my current understanding (captured in my most recent comment), I don't think it's a blocker. It doesn't cause any incorrectness in the system, just unnecessary work in a rare case (active master switches) If Mike has the time to dig into it some more, vetting some of the cases that I outlined wouldn't be a bad idea, but it's not a release blocker. On 8/31/17 11:04 PM, Christopher wrote: https://github.com/apache/accumulo/pull/295 is likely a blocker bug, but I don't really know the full implications of the breakage to the replication system. It is currently marked under Mike Miller's ACCUMULO-4662, rather than a separate issue. On Thu, Aug 31, 2017 at 9:57 AM Michael Wallwrote: You are correct Mike, my mistake. I was looking at https://issues.apache.org/jira/projects/ACCUMULO/versions/12339245. Click the "issues in progress". Thanks for keeping me honest. On Thu, Aug 31, 2017 at 9:46 AM Mike Miller wrote: The only one I have open for 1.8.2 is https://issues.apache.org/jira/browse/ACCUMULO-4662. I will look around for any more spots in the code that need to be fixed but I think its pretty much done. Was this the other ticket you were talking about Mike? https://issues.apache.org/jira/browse/ACCUMULO-4342. Its currently assigned to you. On Thu, Aug 31, 2017 at 9:17 AM, Michael Wall wrote: Mike Miller has 2 tickets in progress and the issue Keith mention is the only blocker I saw. Once those are complete, I am in favor of a 1.8.2 release. I am happy to do the release again and continue as the 1.8 release manager. I am also happy to help someone else do that. It is a patch release, but we typically still run the continuous ingest testing. Christopher, do we still have resources to do that? On Wed, Aug 30, 2017 at 5:34 PM Keith Turner wrote: I Am in favor of that after I finish fixing ACCUMULO-4669 On Wed, Aug 30, 2017 at 2:16 PM, ivan bella wrote: Is it time to consider talking about tagging a 1.8.2 release?
Re: [DISCUSS] GitBox
Ok, cool. Thanks for the clarification and sorry for the ignorance! +0 On 8/18/17 10:49 PM, Christopher wrote: Enabling GH issues is not automatic and would not accompany this change. We would have to explicitly request that, separately, if we want to do that in the future. On Fri, Aug 18, 2017 at 10:30 PM Josh Elser <els...@apache.org> wrote: My biggest concern was the confusion around the enabling of GH issues that would accompany this. As long as we're not trying to do project management in two places concurrently, I don't care either way. On 8/18/17 4:51 PM, Mike Drob wrote: What has changed about the state of Accumulo or GitBox since the last time we had this discussion? Not saying no here, curious as to why you think we should revisit though. On Fri, Aug 18, 2017 at 3:36 PM, Mike Walch <mwa...@apache.org> wrote: I think we should revisit the discussion of using Apache GitBox for Accumulo. If you are unfamiliar with it, GitBox enables better GitHub integration for Apache projects. With GitBox, committers can label GitHub pull requests, squash and merge them using the GitHub UI, and close them if they become stale. I think a move to GitBox will help us do a better job of reviewing and merging pull requests so that contributions are looked at in a timely manner. The only downside to this move is that the git url for Accumulo will change. Does anyone have objections to this?