Re: [REPORT][DRAFT] Apache Accumulo April 2020

2020-04-06 Thread Josh Elser
Maybe clarify on the "issues" section that a resolution in sight but 
it's not done yet. You imply this in other words, but being clear that 
the trademark issue is "acknowledged by the owner and the PMC is waiting 
on a fix by that owner" is helpful for someone who is moving through the 
report quickly.


On 4/6/20 1:43 PM, Michael Wall wrote:

The Apache Accumulo PMC decided to draft its quarterly board
reports on the dev list. Here is a draft of our report which is due
Wednesday, Apr 8, 1 week before the board meeting on
Wednesday, Apr 15. Please let me know if you have any feedback.  I'll post
it on Wed.

Some more detailed metrics are at
https://reporter.apache.org/wizard/statistics?accumulo,
which appears to require a committer login.

Mike


[REPORT] Accumulo - Jan 2020

## Description:

The Apache Accumulo sorted, distributed key/value store is a robust,
scalable,
high performance data storage system that features cell-based access control
and customizable server-side processing. It is based on Google's BigTable
design and is built on top of Apache Hadoop, Zookeeper, and Thrift.

## Issues:

The Oct report listed a discussion about a trademark issue at
http://www.accumulodata.com
[1]. The owner is still looking for the right account used for that site to
repoint to https://accumulo.apache.org.

## Membership Data:

Apache Accumulo was founded 2012-03-20 (8 years ago)
There are currently 36 committers and 36 PMC members in this project.
The Committer-to-PMC ratio is 1:1.

Community changes, past quarter:
- No new PMC members. Last addition was Holly Keebler on 2019-08-08.
- No new committers. Last addition was Holly Keebler on 2019-08-09.
- Ongoing discussion about adding a new member.

## Project Activity:

- No new releases this quarter, although 1.10 is still in the works. The
1.10 releases
will also be our first LTS release [2].
- There was work done to improve the website generation using github
actions [3].
- The monthly "Hack Day" continues in Columbia MD. There was nothing of note
posted from the meetings in Jan[4] and Feb[5]. The March Hack Day had a few
notes[6].
- Some of the developers participated in a slack call[7] on Mar 24, notes
were written to the mailing list[8].

## Community Health:

- Activity in the community is consistent.  There is less activity on the
mailing lists but more on github issues and and PRs [9].

[1]:
https://lists.apache.org/thread.html/514d3cf9162e72f4aa13be1db5d6685999fc83755695308a529de4d6@%3Cprivate.accumulo.apache.org
[2]:
https://lists.apache.org/thread.html/43f051404bc5f15cde8f971ccbdc4cf7b017cc014affd914c357eaad%40%3Cdev.accumulo.apache.org%3E
[3]:
https://lists.apache.org/thread.html/rc9dacacb7bafd1d2289cdfa67ab31d5f4c0c1c47eb1afc905d62ef77%40%3Cdev.accumulo.apache.org%3E
[4]:
https://lists.apache.org/thread.html/r873b186740d0c1c078edafbf0af4fab0158f85aabc74348cfdf8acc8%40%3Cdev.accumulo.apache.org%3E
[5]:
https://lists.apache.org/thread.html/r0c43fdc622d446a0f5cbec79085de86e8ad098a173a73739e86c98fd%40%3Cdev.accumulo.apache.org%3E
[6]:
https://lists.apache.org/thread.html/r3753f5ee8caba67fc00a4a6af36c75018349085f9c5fd7892ba7d7aa%40%3Cdev.accumulo.apache.org%3E
[7]:
https://lists.apache.org/thread.html/r494ba26ee4e8f16fc1b865bb363f3e4a9035738d8c49f10505d6e4f5%40%3Cdev.accumulo.apache.org%3E
[8]:
https://lists.apache.org/thread.html/r2ae8f3375fc2c2e36b11e576456b8697f29057c06d0bf89c6e165d14%40%3Cdev.accumulo.apache.org%3E
[9]:https://reporter.apache.org/wizard/statistics?accumulo



Re: accumulo trace from monitor

2020-03-23 Thread Josh Elser
Hadoop provided the CredentialsProviders API as a way to obfuscate
passwords from being stored in plaintext on your filesystem. This is
just done via an JCEKS file located somewhere on the local filesystem
or HDFS.

If I was to take a wag, I'd assume that the trace token
(username/principal and password) are stored using a
CredentialProvider, probably Cloudera Manager managing that file for
you. The error you give almost looks like Accumulo can't find/extract
the password from the JCEKS file. However, if you have changed the
password, I'm not sure how CM would know that you have changed the
password "underneath" it (I can't imagine it could know this).
Regardless, I'm sure there is a way to fix the JCEKS file so that it
has the new password. Reaching out to a Cloudera rep (services or
support) would be a good idea.

On Fri, Mar 13, 2020 at 1:08 PM marathiboy  wrote:
>
> Thanks,
>
>
>  I am using accumulo 1.9.2-cdh6.1.0 (installed using cloudera parcel)
>
>
>  as far as I know I didn't add anything related to credential provider and
> =
>  when I searched for credential, I don't get any results back.
>
>
>  Thanks
>
>
>  S
>
>
>
> --
> Sent from: http://apache-accumulo.1065345.n5.nabble.com/Developers-f3.html


Re: Replication-related IT failures

2020-01-31 Thread Josh Elser
I'm really upset that you think suggesting removal of the feature is 
appropriate.


More installations than not of HBase (IMO which should be considered 
Accumulo's biggest competitor) use replication. The only users of HBase 
I see who without a disaster recovery plan is developer-focused 
instances with zero uptime guarantees. I'll even go farther to say: any 
user who deploys a database into a production scenario would *require* a 
D/R solution for that database before it would be allowed to be called 
"production".


Yes, there are D/R solutions that can be implemented at the data 
processing layer, but this is almost always less ideal as the cost of 
reprocessing and shipping the raw data is much greater than what 
Accumulo replication could do.


While I am deflated that no other developers have seen this and have any 
interest in helping work through bugs/issues, they are volunteers and I 
can only be sad about this. However, I will not let an argument which 
equates to "we should junk the car because it has a flat tire" go 
without response.


On 1/28/20 10:58 PM, Christopher wrote:

As succinctly as I can:

1. Replication-related IT have been flakey for a long time,
2. The feature is not actively maintained (critical, or at least,
untriaged issues exist dating back to 2014 in JIRA),
3. No volunteers have stepped up thus far to maintain them and make
them reliable or to develop/maintain replication,
4. I don't have time to fix the flakey ITs, and don't have interest or
use case for maintaining the feature,
5. The IT breakages interfere with build testing on CI servers and for releases.

Therefore:

A. I want to @Ignore the flakey ITs, so they don't keep interfering
with test builds,
B. We can re-enable the ITs if/when a volunteer contributes
reliability fixes for them,
C. If nobody steps up, we should have a separate conversation about
possibly phasing out the feature and what that would look like.

The conversation I suggest in "C" is a bit premature right now. I'm
starting with this email to see if any volunteers want to step up.

Even if somebody steps up immediately, they may not have a fix
immediately. So, if there's no objections, I'm going to disable the
flakey tests soon by adding the '@Ignore' JUnit annotation until a fix
is contributed, so they don't keep getting in the way of
troubleshooting other build-related issues. We already know they are
flakey... the constant failures aren't telling us anything new, so the
tests aren't useful as is.



Re: [LAZY][VOTE] A basic, but concrete, LTS proposal

2019-10-31 Thread Josh Elser

Seems fine to me.

Any expectations on how upgrades work within an LTS release? How about 
across LTS releases?


Some specific situations to mull over:

* Can rolling upgrade in an LTS release (to new patch version) with no 
downtime. (e.g. 1.9.1 to 1.9.3)
* Can any LTS release (1.9.1) be guaranteed to upgrade to a later LTS 
release (2.3.1)?
* What about rolling back in an LTS release (e.g. 2.3.2 back to 2.3.1 
after some bug is found)


Not looking for immediate answers, but it would be good to define the 
expectations you have around what we want Accumulo to be able to do 
(ignoring the fact that bugs will certainly arise around 
upgrades/downgrades).


On 10/30/19 9:00 PM, Christopher wrote:

Following up from the discussion at
https://lists.apache.org/thread.html/560bfe8d911be5b829e6250a34dfa1ace0584b24251651be1c77d724@%3Cdev.accumulo.apache.org%3E

I think we should adopt this LTS concept:

LTS releases:
* Designate a new LTS line every 2 years (designation communicates
intent to support/patch)
* Target patch releases to LTS lines for 3 years
* EOL previous LTS line when the new one has been available for 1 year

non-LTS releases:
* Periodic releases that aren't expected to be supported with patch releases
* Can still create patch releases, but only until the next LTS/non-LTS
release line (typically only for critical bugs because we won't
keep a maintenance branch around for non-LTS... instead, we'll roll
bugfixes into the next release, or branch off the tag for a critical
bug)
* non-LTS releases are EOL as soon as the next LTS/non-LTS release
line is created

Transition plan:

* Define LTS on the downloads page of the website
* Designate 1.9 as first (and currently only) LTS release line
* Mark the LTS expected EOL date on the downloads page next to the LTS
releases (to the month... we don't need to get too granular/pedantic)

What this proposal does *not* do is determine how frequently we
release. It *only* determines which versions we will designate as LTS.
So, this doesn't bind us to any fixed release schedule, and we can
release as frequently (or infrequently) as our community wishes
(though I hope the non-LTS releases will occur more frequently, as
they can take more creative risks). But, the main point of this
proposal is that every two years, we'll designate a new release that
will take over as our main "supported line" that will be low-risk, and
more stable over time. The 1-year overlap for people to upgrade from
one LTS to the next in this plan is pretty useful, too, I think.

Here's an example set of hypothetical releases (except 1.9.x and
2.0.0, which are real) under this plan:

* LTS (2018): 1.9.0 -> 1.9.1 -> 1.9.2 -> ... -> EOL(2021)
* non-LTS (2018-2020): 2.0.0 -> 2.1.0 -> 2.1.1 (critical bug fix) -> 2.2.0
* LTS (2020): 2.3.0 -> 2.3.1 -> 2.3.2 -> ... -> EOL(2023)
* non-LTS (2020-2022): 2.4.0 -> 2.5.0 -> 3.0.0
* LTS (2022): 3.1.0 -> 3.1.1 -> 3.1.2 -> ... -> EOL(2025)

This LTS proposal isn't perfect and doesn't solve all possible issues,
but I think it establishes the groundwork for future release
plans/schedules and helps frame discussions about future releases,
that we can work through later if needed.

If there's general consensus on the basic proposal here, I can start
updating the website after 72 hours (lazy consensus) to add the LTS
definition and mark things on the downloads page, accordingly. If it
turns into a significant discussion, I'll hold off on anything until
the discussion points are resolved. If there's disagreement that can't
be resolved, I'll start a more formal vote later (or give up due to
lost motivation, worst case :smile:).

--
Christopher



WALs and HDFS (was Re: Accumulo on Azure - Long Term Monitoring)

2019-10-25 Thread Josh Elser
Forking this off because I don't think it's related to Tushar's original 
question.


HBase and Accumulo both implementation a WAL which can be said to 
relying on a distributed FileSystem which:


1. Is API compatible with HDFS
2. Guarantees that data written prior to an hflush/hsync() is durable

There are actually a few filesystems capable of this: HDFS (duh), 
Azure's Windows Azure Storage Blob (WASB), Azure's Data Lake Store 
(ADLS), and Azure's Blob Filesystem (ABFS).


Azure has had a pretty long interaction with the upstream Hadoop project 
(and some ties in with the HBase project) to make sure that we know how 
to configure their Hadoop drivers that work with those Azure blob stores 
to make that durability guarantee.


That said, it's wrong to say that HBase/Accumulo in a cloud solution 
require HDFS. It is accurate to say that S3 (via the S3A adapter) does 
not provide the durability guarantees that HBase/Accumulo need for WALs 
(but EMRFS does, from what I've heard through the grapevine, but 
requires you to be using EMR)


On 10/25/19 1:49 PM, David Mollitor wrote:

Hello Team,

One short coming of Apache Accumulo and Apache HBase, as I understand it,
is that they both rely on the HDFS for replicated WAL management.
Therefore, HDFS is a requirement even if deploying to a cloud solution.  I
believe Google has developed a consensus enabled WAL management so that
three instances can be stood up without any external dependencies (other
than storage for the collection of rfile/hfile).

Be interested to hear your thoughts on this.

On Fri, Oct 25, 2019 at 1:46 PM Mike Miller  wrote:


Hi Tushar,

The closest thing we have are the performance tests in accumulo-testing,
which is probably the best place.
https://github.com/apache/accumulo-testing#performance-test
The instructions for setting up the scripts are in the README.  There are
only a limited number of tests written though and they used to be
integration tests that were moved out of the main test package.

org.apache.accumulo.testing.performance.tests.DurabilityWriteSpeedPT
org.apache.accumulo.testing.performance.tests.YieldingScanExecutorPT
org.apache.accumulo.testing.performance.tests.ScanExecutorPT
org.apache.accumulo.testing.performance.tests.ScanFewFamiliesPT
org.apache.accumulo.testing.performance.tests.ConditionalMutationsPT
org.apache.accumulo.testing.performance.tests.RandomCachedLookupsPT

On Thu, Oct 24, 2019 at 8:09 PM Tushar Dhadiwal 
wrote:


Hello Everyone,


I am a Software Engineer at Microsoft and our team is currently working

on

making the deployment and operations of Accumulo on Azure as seamless as
possible. As part of this effort, we are attempting to observe / measure
some standard Accumulo operations (e.g. scan, canary queries, ingest,

etc.)

and how their performance varies over time on long standing Accumulo
clusters running in Azure. As part of this we’re looking to come up with

a

metric that we can use to evaluate how healthy / available an Accumulo
cluster is. Over time we intend to use this to understand how underlying
platform changes in Azure can affect overall health of Accumulo

workloads.




As a starting metric for example, we are thinking of continually doing
scans of random values across various tablet servers and capturing timing
information related to how long such scans take. I took a quick look at

the

accumulo-testing repo and didn’t find any tests or probes attempting to

do

something along these lines. Does something like this seem reasonable?

Has

anyone previously attempted something similar? Does accumulo-testing seem
like a reasonable place for code that attempts to do something like this?



Appreciate your thoughts and feedback.



Cheers,

Tushar Dhadiwal







Re: Help with PR 1333

2019-10-21 Thread Josh Elser
Also, just in case you're feeling this way, any kind of contribution 
that you want to put together is helpful, welcome and appreciated.


Please don't feel like you're unable to contribute because you can't get 
something "substantial" together. Sometimes it's the smallest or 
"silliest" changes that can make the biggest impact.


On 10/16/19 9:37 AM, David Mollitor wrote:

Hello Gang,

I work with a customer that uses Accumulo.  My full-time position is not in
development, so while I'm curious to look into Accumulo a bit, I can't make
substantial contributions at this time.  However, I do enjoy working on
things that I like to call "below the waterline."  Reviewing code,
documentation, and performing small clean-up tasks when and where I can.

With that said, I starting looking at cleaning up code in the LRUCache.
However, it lead me down a bit of a rabbit hole and I discovered that the
LRU cache is deleting more data than it needs to be.  I've addressed that
issue in the same PR.

Is someone able to assist me in review and submission?

https://github.com/apache/accumulo/pull/1333


Thanks!



Re: rc2 test question

2019-08-06 Thread Josh Elser

Yeah, this has been sporadically failing since at least 1.7 days.

On 7/30/19 1:37 PM, Owens, Mark wrote:

Yep, after several continual failures it then started passing.

-Original Message-
From: Adam Lerman 
Sent: Tuesday, July 30, 2019 1:26 PM
To: dev@accumulo.apache.org
Subject: Re: rc2 test question

Mark,

I have seen that test class fail often both on personal machines and AWS 
instances.

I can't quantify as I haven't kept track of when it has happened, but I know 
it's that class.

Here is a link to some discussion that happened around that class on slack 
https://the-asf.slack.com/archives/CERNB8NDC/p1560797596014800?thread_ts=1560797596.014800=CERNB8NDC



Re: Accumulo Website question - I will add these steps to the README for Ubuntu/Pop users.

2019-06-07 Thread Josh Elser

Would be better to add:

```
$ gem install bundler
$ bundler install # should automatically install Jekyll for you
```

Using gem to install Jekyll installs it "globally" instead of local to 
your "bundle" (the accumulo website). This increases the likelihood that 
you have some version clash of Ruby dependencies on your local machine.


You would then run Jekyll via `bundle exec jekyll <..>` instead of just 
`jekyll`.


I don't see a reason why you would create a `jekyll` user. Suggest you 
drop that unless you have a reason..


Thanks for updating documentation around this.

On 6/7/19 2:14 PM, Jeffrey Zeiberg wrote:

Step 1: Installing Ruby
First, log into your server, then execute these commands:

sudo apt-get update
sudo apt-get install ruby-full make gcc nodejs build-essential patch
Step 2: Setting up Jekyll
This part is quite easy. Simply execute the following to install Jekyll 
and its dependencies using Gem:


gem install jekyll bundler
Now, create a user for it:

useradd jekyll




Re: NoSQL day summary

2019-05-24 Thread Josh Elser

(Ensuring it goes out to all lists, thanks Artem)

Also thank you to CCRi! Missed them in the original message as a sponsor.

On 5/24/19 4:24 PM, Josh Elser wrote:
(pardon the cross-post -- please reply-list unless there's a good reason 
to cross-post some more)


Hi,

While NoSQL day is fresh in my head, I wanted to share some general 
information about the event this past Tuesday.


We got started around 9:30 AM in D.C., yours truly welcoming everyone, 
followed by a fellow from Intel talking about some hardware they have 
coming and the work that Ram and Anoop have been doing about leveraging 
it in HBase (sadly, we didn't have them in person!). Two gents from 
Microsoft Azure got on stage to talk about Azure and the HBase and 
Phoenix support on HDInsight.


 From there, we broke into two rooms, each of which held seven talks. We 
had lots of familiar faces, but also had some new faces (even for me!). 
After 5pm, we broke out some drinks and snacks and had a candid 
Q/Panel session with a spattering of folks from each community. The 
audience gave us some questions to ask them, but I also tried to 
interject a few doozies to make them sweat.


All said and done, we had about 170 individuals registered, about 140 
folks showed up, and we had roughly 110 of them remaining by the end of 
the day. We were quite happy with these numbers as the usual percentages 
for attendees to registrants is 20-30% lower than this.


Talks were recorded with their slide presentation. Editing/processing on 
these will take some time -- I'd expect a month before I'm able to get 
these posted on YouTube for everyone (but rest assured that it will 
happen).


All attendees should be receiving a survey to give us feedback about the 
event, but I'd also encourage anyone else to send me feedback directly 
that doesn't want to use the form. The hope is that we can keep this 
tradition going next year, but it's always a struggle. I can say that we 
could not have done this without the sponsorship of Bloomberg, Intel, 
Microsoft, Salesforce (and, of course, Cloudera). Thank you all very much.


- Josh


NoSQL day summary

2019-05-24 Thread Josh Elser
(pardon the cross-post -- please reply-list unless there's a good reason 
to cross-post some more)


Hi,

While NoSQL day is fresh in my head, I wanted to share some general 
information about the event this past Tuesday.


We got started around 9:30 AM in D.C., yours truly welcoming everyone, 
followed by a fellow from Intel talking about some hardware they have 
coming and the work that Ram and Anoop have been doing about leveraging 
it in HBase (sadly, we didn't have them in person!). Two gents from 
Microsoft Azure got on stage to talk about Azure and the HBase and 
Phoenix support on HDInsight.


From there, we broke into two rooms, each of which held seven talks. We 
had lots of familiar faces, but also had some new faces (even for me!). 
After 5pm, we broke out some drinks and snacks and had a candid 
Q/Panel session with a spattering of folks from each community. The 
audience gave us some questions to ask them, but I also tried to 
interject a few doozies to make them sweat.


All said and done, we had about 170 individuals registered, about 140 
folks showed up, and we had roughly 110 of them remaining by the end of 
the day. We were quite happy with these numbers as the usual percentages 
for attendees to registrants is 20-30% lower than this.


Talks were recorded with their slide presentation. Editing/processing on 
these will take some time -- I'd expect a month before I'm able to get 
these posted on YouTube for everyone (but rest assured that it will happen).


All attendees should be receiving a survey to give us feedback about the 
event, but I'd also encourage anyone else to send me feedback directly 
that doesn't want to use the form. The hope is that we can keep this 
tradition going next year, but it's always a struggle. I can say that we 
could not have done this without the sponsorship of Bloomberg, Intel, 
Microsoft, Salesforce (and, of course, Cloudera). Thank you all very much.


- Josh


Talk submissions for NoSQL day?

2019-04-22 Thread Josh Elser
Coming back from vacation, I don't see any submissions from the pool of 
developer's I'd normally expect to see. I was hoping that since this was 
in the "backyard" for most folks that we'd have enough talks to fill a 
room for a day.


Abstracts were set to close last Friday, but, best as I can see, they 
are still open now. If you were planning on submitting something, please 
do so ASAP.


https://dataworkssummit.com/nosql-day-2019/

At present, we barely have enough submissions to have a half-day of 
Accumulo content (if all submissions are accepted).


- Josh


Re: [DRAFT] [REPORT] Apache Accumulo - April 2019

2019-04-09 Thread Josh Elser
Some general comments that I suspect the board will ask on their own. If 
you want to proactively address them, it might keep away some 
back-and-forth, but it's up to you.


* Any prospects on new C/PMC?
* An acknowledgement that no decisions are being made at the Hack Day 
and there will be some summary of relevant discussions on the dev list 
for those who were not physically present.


Looks like you covered all of the good stuff.

On 4/9/19 9:58 AM, Michael Wall wrote:

The Apache Accumulo PMC decided to draft our quarterly board
reports on the dev list. Here is a draft of our report which is due
by Wednesday, Apr 10, 1 week before the board meeting on
Wednesday, Apr 17. Please let me know if you have any feedback.

I will submit this tomorrow morning.  Sorry for the short timeline.

Mike



## Description:
  - The Apache Accumulo sorted, distributed key/value store is a robust,
scalable, high performance data storage system that features cell-based
access control and customizable server-side processing.  It is based on
Google's BigTable design and is built on top of Apache Hadoop, Zookeeper,
and Thrift.

## Issues:
  - There are no issues requiring board attention at this time.

## Activity:
  - There was 1 new release, Accumulo-2.0.0-alpha-2 since the last
report [1].  Progress toward the 2.0.0 release continues.
  - A 1.9.3 is in the works with a release soon [2].
  - The community once again started a monthly "Hack Day" in Columbia
MD that is open to all contributors [3].

## Health report:
  - The project remains healthy.  Activity levels on mailing lists, issues
and
pull requests remain constant.

## PMC changes:
  - Currently 34 PMC members.
  - No new PMC members added in the last 3 months
  - Last PMC addition was Nick Felts on Thu Mar 22 2018

## Committer base changes:
  - All Committers are also PMC members, see the PMC Changes section for
details

## Releases:
  - accumulo-2.0.0-alpha-2 was released on Wed Jan 30 2019

## Mailing list activity:
  - Nothing significant in the figures

## Issue activity:
  - 78 issues created [4] and 55 closed [5] across all the Accumulo repos
since the last report.
  - 153 pull requests created [6] and 152 closed [7] across all the Accumulo
repos since the last report.

[1]: https://accumulo.apache.org/release/accumulo-2.0.0-alpha-2/
[2]: https://accumulo.apache.org/release/accumulo-1.9.3/
[3]:
https://lists.apache.org/thread.html/9817962004326e233b8360f945420a3ffed4526f181098aaf4b76e66@%3Cdev.accumulo.apache.org%3E
[4]:
https://github.com/search?q=is:issue+created:2019-01-17..2019-04-17+repo:apache/accumulo+repo:apache/accumulo-website+repo:apache/accumulo-examples+repo:apache/accumulo-docker+repo:apache/accumulo-testing+repo:apache/accumulo-wikisearch+repo:apache/accumulo-proxy+repo:apache/accumulo-maven-plugin+repo:apache/accumulo-pig+repo:apache/accumulo-instamo-archetype+repo:apache/accumulo-bsp
[5]:
https://github.com/search?q=is:issue+closed:2019-01-17..2019-04-17+repo:apache/accumulo+repo:apache/accumulo-website+repo:apache/accumulo-examples+repo:apache/accumulo-docker+repo:apache/accumulo-testing+repo:apache/accumulo-wikisearch+repo:apache/accumulo-proxy+repo:apache/accumulo-maven-plugin+repo:apache/accumulo-pig+repo:apache/accumulo-instamo-archetype+repo:apache/accumulo-bsp
[6]:
https://github.com/search?q=is:pr+created:2019-01-17..2019-04-17+repo:apache/accumulo+repo:apache/accumulo-website+repo:apache/accumulo-examples+repo:apache/accumulo-docker+repo:apache/accumulo-testing+repo:apache/accumulo-wikisearch+repo:apache/accumulo-proxy+repo:apache/accumulo-maven-plugin+repo:apache/accumulo-pig+repo:apache/accumulo-instamo-archetype+repo:apache/accumulo-bsp
[7]:
https://github.com/search?q=is:pr+closed:2019-01-17..2019-04-17+repo:apache/accumulo+repo:apache/accumulo-website+repo:apache/accumulo-examples+repo:apache/accumulo-docker+repo:apache/accumulo-testing+repo:apache/accumulo-wikisearch+repo:apache/accumulo-proxy+repo:apache/accumulo-maven-plugin+repo:apache/accumulo-pig+repo:apache/accumulo-instamo-archetype+repo:apache/accumulo-bsp



Re: [VOTE] Apache Accumulo 1.9.3-rc2

2019-04-01 Thread Josh Elser

Again, like I included earlier:

> (Append ".sha1", ".md5", or ".asc" to download the signature/hash for
a given artifact.)

On 4/1/19 1:56 PM, Christopher wrote:

In what way?

On Mon, Apr 1, 2019 at 1:54 PM Josh Elser  wrote:


Your email template is wrong.

On 4/1/19 1:33 PM, Christopher wrote:

Sorry, I don't understand what you mean by 'retelling of "checksums of old"'.

On Mon, Apr 1, 2019 at 12:30 PM Josh Elser  wrote:


I think Mike's point was your VOTE template does not reflect the
retelling of "checksums of old"

   > (Append ".sha1", ".md5", or ".asc" to download the signature/hash for
a given artifact.)

On 3/31/19 10:54 PM, Christopher wrote:

Mike,

We already stopped using md5 and sha1 for the release artifacts on the
mirrors. I did this some time ago, and we discussed it on list on
previous vote threads (last year)... which resulted in me changing the
release candidate build script automated tooling to embed the SHA512
sums for the tarballs directly in the release vote message. I even
went back and updated the downloads page for the previous releases and
updated the mirrors to be SHA512 only. Because of these steps I took,
Accumulo was one of the first projects across the entire ASF who were
100% compliant immediately after INFRA VP updated the release
distribution policy you linked.

*This is a resolved action for Accumulo.*

FWIW, SHA512 was also used as the hash algorithm in the GPG signature
(same as every RC I've ever prepped for ASF). The only remaining md5
and sha1 reference are Maven-specific tooling, and we have no control
over that tooling. We could change the vote template to no longer
mention them, but I don't see the point since they're still relevant
within the context of Maven artifact hosting, and that's the context
in which they are presented in the vote email.

On Sun, Mar 31, 2019 at 1:59 PM Michael Wall  wrote:


-1 for the issue with commons config

I check the signatures, they are good.  We should stop using md5 and sha1
though, see https://www.apache.org/dev/release-distribution#sigs-and-sums.
Has anyone looked at moving to sha256 and/org sha512?
Successful run of mvn clean verify -Psunny

On Sat, Mar 30, 2019 at 11:31 PM Keith Turner  wrote:


I completed a continuous ingest run on a 10 node cluster running
Centos 7.  I used the native map.  I had to rebuild Accumulo to work
around  #1065 inorder to get the verify M/R job to run.

   org.apache.accumulo.test.continuous.ContinuousVerify$Counts
   REFERENCED=34417110819
   UNREFERENCED=9097524

On Wed, Mar 27, 2019 at 7:57 PM Christopher  wrote:


Accumulo Developers,

Please consider the following candidate for Apache Accumulo 1.9.3.

This supersedes RC1 and contains the following change:
https://github.com/apache/accumulo/pull/1057

Git Commit:
   94f9782242a1f336e176c282f0f90063a21e361d
Branch:
   1.9.3-rc2

If this vote passes, a gpg-signed tag will be created using:
   git tag -f -m 'Apache Accumulo 1.9.3' -s rel/1.9.3 \
   94f9782242a1f336e176c282f0f90063a21e361d

Staging repo:

https://repository.apache.org/content/repositories/orgapacheaccumulo-1077

Source (official release artifact):


https://repository.apache.org/content/repositories/orgapacheaccumulo-1077/org/apache/accumulo/accumulo/1.9.3/accumulo-1.9.3-src.tar.gz

Binary:

https://repository.apache.org/content/repositories/orgapacheaccumulo-1077/org/apache/accumulo/accumulo/1.9.3/accumulo-1.9.3-bin.tar.gz

(Append ".sha1", ".md5", or ".asc" to download the signature/hash for
a given artifact.)

In addition to the tarballs, and their signatures, the following checksum
files will be added to the dist/release SVN area after release:
accumulo-1.9.3-src.tar.gz.sha512 will contain:
SHA512 (accumulo-1.9.3-src.tar.gz) =


b366b89295b1835038cb242f8ad46b1d8455753a987333f0e15e3d89749540f2cd59db1bc6cf7100fc9050d3d0bc7340a3b661381549d40f2f0223d4120fd809

accumulo-1.9.3-bin.tar.gz.sha512 will contain:
SHA512 (accumulo-1.9.3-bin.tar.gz) =


cc909296d9bbd12e08064fccaf21e81b754c183a8264dfa2575762c76705fd0c580b50c2b224c60feaeec120bd618fba4d6176d0f53e96e1ca9da0d9e2556f1f


Signing keys are available at https://www.apache.org/dist/accumulo/KEYS
(Expected fingerprint: 8CC4F8A2B29C2B040F2B835D6F0CDAE700B6899D)

Release notes (in progress) can be found at:
https://accumulo.apache.org/release/accumulo-1.9.3/

Release testing instructions:
https://accumulo.apache.org/contributor/verifying-release

Please vote one of:
[ ] +1 - I have verified and accept...
[ ] +0 - I have reservations, but not strong enough to vote against...
[ ] -1 - Because..., I do not accept...
... these artifacts as the 1.9.3 release of Apache Accumulo.

This vote will remain open until at least Sun Mar 31 00:00:00 UTC 2019.
(Sat Mar 30 20:00:00 EDT 2019 / Sat Mar 30 17:00:00 PDT 2019)
Voting can continue after this deadline until the r

Re: [VOTE] Apache Accumulo 1.9.3-rc2

2019-04-01 Thread Josh Elser

Your email template is wrong.

On 4/1/19 1:33 PM, Christopher wrote:

Sorry, I don't understand what you mean by 'retelling of "checksums of old"'.

On Mon, Apr 1, 2019 at 12:30 PM Josh Elser  wrote:


I think Mike's point was your VOTE template does not reflect the
retelling of "checksums of old"

  > (Append ".sha1", ".md5", or ".asc" to download the signature/hash for
a given artifact.)

On 3/31/19 10:54 PM, Christopher wrote:

Mike,

We already stopped using md5 and sha1 for the release artifacts on the
mirrors. I did this some time ago, and we discussed it on list on
previous vote threads (last year)... which resulted in me changing the
release candidate build script automated tooling to embed the SHA512
sums for the tarballs directly in the release vote message. I even
went back and updated the downloads page for the previous releases and
updated the mirrors to be SHA512 only. Because of these steps I took,
Accumulo was one of the first projects across the entire ASF who were
100% compliant immediately after INFRA VP updated the release
distribution policy you linked.

*This is a resolved action for Accumulo.*

FWIW, SHA512 was also used as the hash algorithm in the GPG signature
(same as every RC I've ever prepped for ASF). The only remaining md5
and sha1 reference are Maven-specific tooling, and we have no control
over that tooling. We could change the vote template to no longer
mention them, but I don't see the point since they're still relevant
within the context of Maven artifact hosting, and that's the context
in which they are presented in the vote email.

On Sun, Mar 31, 2019 at 1:59 PM Michael Wall  wrote:


-1 for the issue with commons config

I check the signatures, they are good.  We should stop using md5 and sha1
though, see https://www.apache.org/dev/release-distribution#sigs-and-sums.
Has anyone looked at moving to sha256 and/org sha512?
Successful run of mvn clean verify -Psunny

On Sat, Mar 30, 2019 at 11:31 PM Keith Turner  wrote:


I completed a continuous ingest run on a 10 node cluster running
Centos 7.  I used the native map.  I had to rebuild Accumulo to work
around  #1065 inorder to get the verify M/R job to run.

  org.apache.accumulo.test.continuous.ContinuousVerify$Counts
  REFERENCED=34417110819
  UNREFERENCED=9097524

On Wed, Mar 27, 2019 at 7:57 PM Christopher  wrote:


Accumulo Developers,

Please consider the following candidate for Apache Accumulo 1.9.3.

This supersedes RC1 and contains the following change:
https://github.com/apache/accumulo/pull/1057

Git Commit:
  94f9782242a1f336e176c282f0f90063a21e361d
Branch:
  1.9.3-rc2

If this vote passes, a gpg-signed tag will be created using:
  git tag -f -m 'Apache Accumulo 1.9.3' -s rel/1.9.3 \
  94f9782242a1f336e176c282f0f90063a21e361d

Staging repo:

https://repository.apache.org/content/repositories/orgapacheaccumulo-1077

Source (official release artifact):


https://repository.apache.org/content/repositories/orgapacheaccumulo-1077/org/apache/accumulo/accumulo/1.9.3/accumulo-1.9.3-src.tar.gz

Binary:

https://repository.apache.org/content/repositories/orgapacheaccumulo-1077/org/apache/accumulo/accumulo/1.9.3/accumulo-1.9.3-bin.tar.gz

(Append ".sha1", ".md5", or ".asc" to download the signature/hash for
a given artifact.)

In addition to the tarballs, and their signatures, the following checksum
files will be added to the dist/release SVN area after release:
accumulo-1.9.3-src.tar.gz.sha512 will contain:
SHA512 (accumulo-1.9.3-src.tar.gz) =


b366b89295b1835038cb242f8ad46b1d8455753a987333f0e15e3d89749540f2cd59db1bc6cf7100fc9050d3d0bc7340a3b661381549d40f2f0223d4120fd809

accumulo-1.9.3-bin.tar.gz.sha512 will contain:
SHA512 (accumulo-1.9.3-bin.tar.gz) =


cc909296d9bbd12e08064fccaf21e81b754c183a8264dfa2575762c76705fd0c580b50c2b224c60feaeec120bd618fba4d6176d0f53e96e1ca9da0d9e2556f1f


Signing keys are available at https://www.apache.org/dist/accumulo/KEYS
(Expected fingerprint: 8CC4F8A2B29C2B040F2B835D6F0CDAE700B6899D)

Release notes (in progress) can be found at:
https://accumulo.apache.org/release/accumulo-1.9.3/

Release testing instructions:
https://accumulo.apache.org/contributor/verifying-release

Please vote one of:
[ ] +1 - I have verified and accept...
[ ] +0 - I have reservations, but not strong enough to vote against...
[ ] -1 - Because..., I do not accept...
... these artifacts as the 1.9.3 release of Apache Accumulo.

This vote will remain open until at least Sun Mar 31 00:00:00 UTC 2019.
(Sat Mar 30 20:00:00 EDT 2019 / Sat Mar 30 17:00:00 PDT 2019)
Voting can continue after this deadline until the release manager
sends an email ending the vote.

Thanks!

P.S. Hint: download the whole staging repo with
  wget -erobots=off -r -l inf -np -nH \


https://repository.apache.org/content/repositories/orgapacheaccumulo-1077/

  # note the trailing slash is needed




Re: [VOTE] Apache Accumulo 1.9.3-rc2

2019-04-01 Thread Josh Elser
I think Mike's point was your VOTE template does not reflect the 
retelling of "checksums of old"


> (Append ".sha1", ".md5", or ".asc" to download the signature/hash for
a given artifact.)

On 3/31/19 10:54 PM, Christopher wrote:

Mike,

We already stopped using md5 and sha1 for the release artifacts on the
mirrors. I did this some time ago, and we discussed it on list on
previous vote threads (last year)... which resulted in me changing the
release candidate build script automated tooling to embed the SHA512
sums for the tarballs directly in the release vote message. I even
went back and updated the downloads page for the previous releases and
updated the mirrors to be SHA512 only. Because of these steps I took,
Accumulo was one of the first projects across the entire ASF who were
100% compliant immediately after INFRA VP updated the release
distribution policy you linked.

*This is a resolved action for Accumulo.*

FWIW, SHA512 was also used as the hash algorithm in the GPG signature
(same as every RC I've ever prepped for ASF). The only remaining md5
and sha1 reference are Maven-specific tooling, and we have no control
over that tooling. We could change the vote template to no longer
mention them, but I don't see the point since they're still relevant
within the context of Maven artifact hosting, and that's the context
in which they are presented in the vote email.

On Sun, Mar 31, 2019 at 1:59 PM Michael Wall  wrote:


-1 for the issue with commons config

I check the signatures, they are good.  We should stop using md5 and sha1
though, see https://www.apache.org/dev/release-distribution#sigs-and-sums.
Has anyone looked at moving to sha256 and/org sha512?
Successful run of mvn clean verify -Psunny

On Sat, Mar 30, 2019 at 11:31 PM Keith Turner  wrote:


I completed a continuous ingest run on a 10 node cluster running
Centos 7.  I used the native map.  I had to rebuild Accumulo to work
around  #1065 inorder to get the verify M/R job to run.

 org.apache.accumulo.test.continuous.ContinuousVerify$Counts
 REFERENCED=34417110819
 UNREFERENCED=9097524

On Wed, Mar 27, 2019 at 7:57 PM Christopher  wrote:


Accumulo Developers,

Please consider the following candidate for Apache Accumulo 1.9.3.

This supersedes RC1 and contains the following change:
https://github.com/apache/accumulo/pull/1057

Git Commit:
 94f9782242a1f336e176c282f0f90063a21e361d
Branch:
 1.9.3-rc2

If this vote passes, a gpg-signed tag will be created using:
 git tag -f -m 'Apache Accumulo 1.9.3' -s rel/1.9.3 \
 94f9782242a1f336e176c282f0f90063a21e361d

Staging repo:

https://repository.apache.org/content/repositories/orgapacheaccumulo-1077

Source (official release artifact):


https://repository.apache.org/content/repositories/orgapacheaccumulo-1077/org/apache/accumulo/accumulo/1.9.3/accumulo-1.9.3-src.tar.gz

Binary:

https://repository.apache.org/content/repositories/orgapacheaccumulo-1077/org/apache/accumulo/accumulo/1.9.3/accumulo-1.9.3-bin.tar.gz

(Append ".sha1", ".md5", or ".asc" to download the signature/hash for
a given artifact.)

In addition to the tarballs, and their signatures, the following checksum
files will be added to the dist/release SVN area after release:
accumulo-1.9.3-src.tar.gz.sha512 will contain:
SHA512 (accumulo-1.9.3-src.tar.gz) =


b366b89295b1835038cb242f8ad46b1d8455753a987333f0e15e3d89749540f2cd59db1bc6cf7100fc9050d3d0bc7340a3b661381549d40f2f0223d4120fd809

accumulo-1.9.3-bin.tar.gz.sha512 will contain:
SHA512 (accumulo-1.9.3-bin.tar.gz) =


cc909296d9bbd12e08064fccaf21e81b754c183a8264dfa2575762c76705fd0c580b50c2b224c60feaeec120bd618fba4d6176d0f53e96e1ca9da0d9e2556f1f


Signing keys are available at https://www.apache.org/dist/accumulo/KEYS
(Expected fingerprint: 8CC4F8A2B29C2B040F2B835D6F0CDAE700B6899D)

Release notes (in progress) can be found at:
https://accumulo.apache.org/release/accumulo-1.9.3/

Release testing instructions:
https://accumulo.apache.org/contributor/verifying-release

Please vote one of:
[ ] +1 - I have verified and accept...
[ ] +0 - I have reservations, but not strong enough to vote against...
[ ] -1 - Because..., I do not accept...
... these artifacts as the 1.9.3 release of Apache Accumulo.

This vote will remain open until at least Sun Mar 31 00:00:00 UTC 2019.
(Sat Mar 30 20:00:00 EDT 2019 / Sat Mar 30 17:00:00 PDT 2019)
Voting can continue after this deadline until the release manager
sends an email ending the vote.

Thanks!

P.S. Hint: download the whole staging repo with
 wget -erobots=off -r -l inf -np -nH \


https://repository.apache.org/content/repositories/orgapacheaccumulo-1077/

 # note the trailing slash is needed




Re: Combining output of multiple filters/iterators

2019-03-29 Thread Josh Elser
You cannot feasibly hold onto some intermediate batch of nodes in 
memory. You're invalidating the general premise of how Accumulo 
iterators are meant to work in doing this. Further, an Iterator can 
_only_ safely operate within one row of a table. Two adjacent rows may 
be located on two different physical machines.


Would suggest you read through this presentation and try to take some 
time to understand why they did it this way: 
http://www.pdl.cmu.edu/SDI/2013/slides/big_graph_nsa_rd_2013_56002v1.pdf. 
You might also be able to take something from Shana Hutchison's work on 
Graphulo: https://arxiv.org/abs/1606.07085


On 3/29/19 2:20 PM, Enas Alkawasmi wrote:

Thank you for this suggestion. i have one question, c I pass options to the
new source that are from the result of the current iterator? . the new
iterator need to get the parent nodes from the the current one how can
enforce the iterator to wait for the result form its preceding iterator?



--
Sent from: http://apache-accumulo.1065345.n5.nabble.com/Developers-f3.html



Re: Adding additional default JVM parameters to accumulo-env.sh boosts performance and prevent crashing of Tservers and masters.

2019-03-22 Thread Josh Elser
Very rarely do JVM GC properties universally apply to all users and 
workloads.


I think it would be better to document why these options helped in your 
workload. Teach folks how to choose the correct JVM properties for their 
workloads is a better way forward, than encouraging folks to treat them 
as black-boxes.


While this is extremely in-depth, I like the tone of this blog post: 
https://blogs.apache.org/hbase/entry/tuning_g1gc_for_your_hbase. The 
authors explain what they observed from a system, what they changed, 
what effect that change should have, and, finally, the change that they 
observe.


On 3/22/19 9:28 AM, Jeffrey Zeiberg wrote:

Jeffrey Manno (ASRC Federal) and Jeffrey Zeiberg (ASRC Federal) have
discovered that adding a few new JVM options to the JAVA_OPTS set prevented
crashing and increase system performance.

They were added after line 68 in accumulo-env.sh.  They are:
'server'
'-XX:+UseParallelOldGC'
'-XX:NewRatio=3'
'-XX:AggressiveHeap'

The machines we are using are 7 year old machines with 8G of main memory,
between 500G - 1T HDD and Intel I5 or I7 processors.

Maybe these parameters should be made in the Accumulo 2.0 distributions
accumulo-env.sh file?

Comments?



Re: Builds

2019-02-05 Thread Josh Elser
I feel like trying to put the Jenkinsfile in a separate branch might 
cause more headache than it's worth.


Happen to stumble on AW's write-up on a similar subject? 
https://effectivemachines.com/2019/01/24/using-apache-yetus-with-jenkins-and-github-part-1/


On 2/5/19 2:35 PM, Michael Wall wrote:

I think I see a path forward using a Jenkinsfile.  Please comment on this
plan if something doesn't make sense or someone does not agree.

1 - Create a new branch off master called jenkinsfile or such and add a
jenkinsfile that runs only unit tests but does it in docker.
2 - Create a new job on jenkins.revelc.net that only builds that branch
3 - Iterate on the jenkinsfile until it works cleanly
4 - Reconfigure the Accumulo-Master job on builds.apache.org to use the
Jenkinsfile

 From there it will hopefully make sense about what to do next.  Maybe a
Jenkinsfile-IT or something.

Christopher, looks like I already have permission on jenkins.revelc.net to
make jobs.

Thanks

Mike

On Sun, Jan 6, 2019 at 10:32 PM Christopher  wrote:


I've seen similar problems with processes left behind on my own
Jenkins instance, but my solution is to periodically log in and nuke
the processes, and even occasionally reboot the instance. I don't
think these options are available to us on builds.apache.org, because
we don't have direct access to them. I'm also not sure why it happens
or why Jenkins doesn't properly clean up child processes leftover from
no-longer-running builds it launched.

One suggestion was to run in Docker, but I don't know how to do that.
I attempted to do it that way, and got as far as Jenkins running and
connecting to the Docker instance, but the Maven versions available to
configure the Maven build does not seem to match what's inside the
Docker container and the job quickly fails so it's not clear how
to do a Maven build in Jenkins with Docker given the options INFRA has
made available to us. Perhaps if they had instructions, or an example
Maven job we could model our jobs after?


On Sun, Jan 6, 2019 at 9:34 PM Michael Wall  wrote:


Anyone look at this yet?


https://lists.apache.org/thread.html/e78b1b8ccaf11eb5cb557ec29d3208c3fec0450fd2b908b3f7922c56@%3Cbuilds.apache.org%3E


Not sure who even has karma to do anything here
https://builds.apache.org/view/A/view/Accumulo/






Re: How to Perform an True Update of a Record?

2019-01-23 Thread Josh Elser
Why are you trying to do this in the first place? When you write a new 
version of a cell, you are essentially replacing the old value. Leaving 
the old value in the table and lazily removing it (via compaction) is a 
core optimization to the write-path for Accumulo (from BigTable itself).


I'm having a hard time understanding why you're trying to do what you're 
asking.


On 1/23/19 1:26 PM, gtotsline wrote:

Hi Mike -

Thanks for responding so quickly, it's greatly appreciated.  The
ConditionalWriter does not appear to address our use case where we actually
want to suppress Accumulo versioning  of a record based on the value in the
record that was read vs. input data our system receives.  Is there a way to
dynamically suppress Accumulo record versioning?



--
Sent from: http://apache-accumulo.1065345.n5.nabble.com/Developers-f3.html



Re: 2.0.0-alpha-2

2019-01-17 Thread Josh Elser

YCSB is probably the easiest thing to do workload-specific comparisons.

On 1/16/19 11:19 AM, Mike Miller wrote:

I think we can start doing that now with the alpha release, I am just not
sure how.  Did you have any ideas?

On Wed, Jan 16, 2019 at 12:31 AM Sean Busbey 
wrote:


Has anyone gotten to do a perf comparison to 1.9 yet? The time to do
that would be during beta I guess?

On Tue, Jan 15, 2019 at 5:18 PM Christopher  wrote:


I'm planning to prepare a 2.0.0-alpha-2-rc1 Thursday. So, merge your
stuff if you want me to include it then.
Depending on the quality of this alpha, we may want to do a beta (with
an API freeze?)... or just release 2.0.0 next. I wouldn't think we'd
want too many alphas.




--
busbey





Re: [DRAFT] [REPORT] Apache Accumulo - Jan 2019

2019-01-07 Thread Josh Elser



On 1/7/19 11:26 AM, Michael Wall wrote:

Hi Josh, thanks for reviewing.

The "PMC Changes" was copied directly from the reporter.apache.org
template.  The "Committer base changes" was in response to feedback from
the board several reports ago asking if all committers were PMC members.  I
could say something like "All Committers are also PMC members, see the PMC
Changes section for details".  Is that what you mean by more explicit?  Or
do you mean something else?


I had initially thought just copy/paste'ing the text from the PMC 
additions was good, but, on second thought, suggesting that readers look 
at the PMC additions section in the Committer additions section is just 
as good.


Thanks for the careful attention, Mike!


Re: Slack for Accumulo

2018-12-10 Thread Josh Elser

Just made one -- I only saw Mike Wall on slack so far (invited him).

On 12/10/18 4:31 PM, Michael Wall wrote:

Yeah, there are some Apache projects that use slack.  Can you create an
Accumulo channel at https://the-asf.slack.com?  I don't recall what I did
to set up my account there, but I did have to use my apache.org email
account.  The only thing I see from other communities is a small delay for
users since they need to be given access to the channel.

+1 to shutting down IRC.

Mike

On Mon, Dec 10, 2018 at 4:11 PM Christopher  wrote:


We did have a HipChat room, but we didn't advertise it well and nobody
really used it. With all that happened with HipChat being bought by
Atlassian and then sold off to Slack (or something like that), I'm not sure
where ASF Infra currently stands on providing chat as an ASF resource.

I have no objections to trying out Slack within the project. However, if
Infra does move towards something in future (like an official ASF Slack
group), we should try to follow whatever path they pave.

Regardless of whether Slack pans out, we should probably shut down the IRC
room... hardly anybody uses it, and the few that do are lurkers only or
bots. It doesn't make sense to continue to advertise it as a way for
users/devs to contact us.

On Mon, Dec 10, 2018, 15:44 Mike Walch 
I would like to create a Slack chatroom for Accumulo and advertise it on

on

our 'contact us' page [1] with an invite link to make it easy to join. Is
anyone opposed to this?  There will be no requirement that Accumulo users
or developers use Slack. We currently have an IRC chatroom but it's not
used much. I think Slack will be used more as it's simpler and saves the
latest history which helps if you join an ongoing discussion.  If it ends
up being rarely used over the next few months, I am OK with shutting it
down.

[1]: https://accumulo.apache.org/contact-us/







Re: commons-vfs2.jar 2.2 buggy

2018-10-24 Thread Josh Elser

It seems like commons-vfs2 is just a pile of crap.

It's been known to have bugs for years and we've seen zero progress from 
them on making them better.


IMO, rip the whole damn thing out.

On 10/24/18 12:42 PM, Matthew Peterson wrote:

Hello Accumulo,

Summary: commons-vfs2 version 2.2 seems to have problems and it may be
worth rolling back to version 2.1 of commons-vfs2.

My project upgraded a system from Accumulo 1.8.1 to 1.9.2.  Immediately
after switching vfs contexts we saw problems.  The tservers would error in
iterators about missing classes that were clearly on the classpath.  The
problems were persistent until we replaced the commons-vfs2.jar with
version 2.1 (Accumulo 1.9.2 uses version 2.2).  Until we rolled vfs back,
we received errors particularly with Spring code trying to access various
classes and files within the jars.  It looks like in 2.2, commons-vfs
implemented a doDetach method which closed the zip files.  We suspect that
code is the problem but haven't tested that theory.

I suspect that most users don't use this feature.

Thanks!
Matt



Re: [DISCUSS] 2.0.0-alpha?

2018-10-10 Thread Josh Elser

On 10/9/18 2:10 PM, Keith Turner wrote:

On Tue, Oct 9, 2018 at 1:52 PM Keith Turner  wrote:


On Tue, Oct 9, 2018 at 12:53 PM Josh Elser  wrote:




On 10/9/18 12:44 PM, Keith Turner wrote:

On Sat, Oct 6, 2018 at 12:27 AM Christopher  wrote:


Hi Accumulo devs,

I'm thinking about initiating a vote next week for a 2.0.0-alpha
release, so we can have an official ASF release (albeit without the
usual stability expectations as a normal release) to be available for
the upcoming Accumulo Summit.

An alpha version would signal our progress towards 2.0.0 final, serve
as a basis for testing, and give us something to share with a wider
audience to solicit feedback on the API, configuration, and module
changes. Of course, it would still have to meet ASF release
requirements... like licensing and stuff, and it should essentially
work (so people can actually run tests), but in an alpha release, we
could tolerate flaws we wouldn't in a final release.

Ideally, I would have preferred a 2.0.0 final at this point in the
year, but I think it needs more testing.

Does an alpha release next week seem reasonable to you?



I am in favor of an Alpha release.  Also, Alpha releases imply feature
freeze in some projects.  I am in favor of feature freeze.  Is anyone
opposed to feature freeze?

Below is what feature freeze means to me.

We agree to avoid adding new features for 2.0 AND work on 2.0 will
focus on bug fixes and polishing features added before the Alpha.
This polishing work could result in API changes.  If anyone really
wants to add a new feature, they should discuss it on the mailing
list.


No concerns with an alpha also implying a feature-freeze. That does mean
that it should be even more straightforward to have a complete list of
the features landing in 2.0.0 ;) (which remains my only concern)


Are you concerned about not completing the release notes before an
alpha vote?  Or is your concern something else?


Personally, I would like to see the release notes completed before
2.0.0-alpha is announced.  I can't think of compelling reasons to
complete it earlier than that.  However, it seems critical to complete
them before announcing.



It's in the same line of thinking that Sean stated:

> "I'd really like us to put 2.0 GA readiness in terms of feature /
correctness goals rather than a strict time limit."

Such a major release like 2.0 without clear reasons why users should 
care strikes me as very "so what?".


Re: [DISCUSS] 2.0.0-alpha?

2018-10-09 Thread Josh Elser




On 10/9/18 12:44 PM, Keith Turner wrote:

On Sat, Oct 6, 2018 at 12:27 AM Christopher  wrote:


Hi Accumulo devs,

I'm thinking about initiating a vote next week for a 2.0.0-alpha
release, so we can have an official ASF release (albeit without the
usual stability expectations as a normal release) to be available for
the upcoming Accumulo Summit.

An alpha version would signal our progress towards 2.0.0 final, serve
as a basis for testing, and give us something to share with a wider
audience to solicit feedback on the API, configuration, and module
changes. Of course, it would still have to meet ASF release
requirements... like licensing and stuff, and it should essentially
work (so people can actually run tests), but in an alpha release, we
could tolerate flaws we wouldn't in a final release.

Ideally, I would have preferred a 2.0.0 final at this point in the
year, but I think it needs more testing.

Does an alpha release next week seem reasonable to you?



I am in favor of an Alpha release.  Also, Alpha releases imply feature
freeze in some projects.  I am in favor of feature freeze.  Is anyone
opposed to feature freeze?

Below is what feature freeze means to me.

We agree to avoid adding new features for 2.0 AND work on 2.0 will
focus on bug fixes and polishing features added before the Alpha.
This polishing work could result in API changes.  If anyone really
wants to add a new feature, they should discuss it on the mailing
list.


No concerns with an alpha also implying a feature-freeze. That does mean 
that it should be even more straightforward to have a complete list of 
the features landing in 2.0.0 ;) (which remains my only concern)


Re: [DISCUSS] 2.0.0-alpha?

2018-10-09 Thread Josh Elser

Ah, yes. I think you're right. Thanks again :)

On 10/9/18 12:32 PM, Mike Miller wrote:

Didn't RFile summaries show up in 1.9 too? (maybe I'm inventing that)


I think you are thinking of Sampling, that was released in 1.8.0, showing
up in 1.9.  I still get them confused.  They both are similar and start
with S.

On Tue, Oct 9, 2018 at 12:03 PM Josh Elser  wrote:


Thanks, Mike.

Didn't RFile summaries show up in 1.9 too? (maybe I'm inventing that)

On 10/9/18 11:39 AM, Mike Miller wrote:

I think once we collect all the changes in 2.0 (there are a lot) we will

be

able to create some bullet points, picking out changes most interesting

to

users. The new bulk import process Kieth, Mark and I worked on should be
one.  There are many new features that come along with it that weren't
possible.  There was all the work Mike did for usability that he is
presenting at the summit and wrote a blog post about 2 years ago:


https://accumulo.apache.org/blog/2016/11/16/simpler-scripts-and-config.html

Rfile Summaries was a big change but happened a while ago.  Recently, the
new Crypto service and new AccumuloClient builder are some other features
that come to mind.


On Mon, Oct 8, 2018 at 9:05 PM Josh Elser  wrote:


Frankly, planning a release without even an idea of what is going into

it

seems like a waste of time to me.

I didn't ask these questions to try to squash such a release; I don't

think

they're particularly difficult to figure out. Just curious what the

release

notes would look like (as a user, this is what I would care about). I

don't

think I'm alone.

On Mon, Oct 8, 2018, 19:33 Christopher  wrote:


I don't know the answers to these questions. I just want to put a
stake in the ground before the Accumulo Summit, so we have a basis for
evaluation and testing, and answering some of these unknowns.
On Mon, Oct 8, 2018 at 11:28 AM Josh Elser  wrote:


I would like to know what the scope of 2.0 is. Specifically:

* What's new in this 2.0 alpha that people that is driving the

release?

* Is there anything else expected to land post-alpha/pre-GA?

On 10/6/18 1:36 PM, Sean Busbey wrote:

yes alphas please. Do we want to talk about expectations on time
between alpha releases? What kind of criteria for beta or GA?

a *lot* has changed in the 2.0 codebase.
On Sat, Oct 6, 2018 at 11:45 AM Ed Coleman 

wrote:


+1

In addition to the reasons stated by Christopher, I think that it

also provides a clearer signal to earlier adopters that the public API
*may* change before the formal release. With a formal release

candidate,

I

interpret that it signals that only bug-fixes would occur up and until

the

formal release.


With the length of time that we take between minor and patch

releases, the even longer time that it takes the customer base to

upgrade

and development cost that we have supporting multiple branches, taking

some

extra time now to solicit feedback seems prudent. While the specifics

and

implications of semver are clear, sometimes it seems that there is
additional weight and additional perceived risk when changing major
versions, an alpha version preserves our flexibility while still moving
forward.


Ed Coleman

-Original Message-
From: Christopher [mailto:ctubb...@apache.org]
Sent: Saturday, October 06, 2018 12:28 AM
To: accumulo-dev 
Subject: [DISCUSS] 2.0.0-alpha?

Hi Accumulo devs,

I'm thinking about initiating a vote next week for a 2.0.0-alpha

release, so we can have an official ASF release (albeit without the

usual

stability expectations as a normal release) to be available for the
upcoming Accumulo Summit.


An alpha version would signal our progress towards 2.0.0 final,

serve

as a basis for testing, and give us something to share with a wider
audience to solicit feedback on the API, configuration, and module

changes.

Of course, it would still have to meet ASF release requirements... like
licensing and stuff, and it should essentially work (so people can

actually

run tests), but in an alpha release, we could tolerate flaws we

wouldn't

in

a final release.


Ideally, I would have preferred a 2.0.0 final at this point in the

year, but I think it needs more testing.


Does an alpha release next week seem reasonable to you?

Christopher
















Re: [DISCUSS] 2.0.0-alpha?

2018-10-09 Thread Josh Elser

Thanks, Mike.

Didn't RFile summaries show up in 1.9 too? (maybe I'm inventing that)

On 10/9/18 11:39 AM, Mike Miller wrote:

I think once we collect all the changes in 2.0 (there are a lot) we will be
able to create some bullet points, picking out changes most interesting to
users. The new bulk import process Kieth, Mark and I worked on should be
one.  There are many new features that come along with it that weren't
possible.  There was all the work Mike did for usability that he is
presenting at the summit and wrote a blog post about 2 years ago:
https://accumulo.apache.org/blog/2016/11/16/simpler-scripts-and-config.html
Rfile Summaries was a big change but happened a while ago.  Recently, the
new Crypto service and new AccumuloClient builder are some other features
that come to mind.


On Mon, Oct 8, 2018 at 9:05 PM Josh Elser  wrote:


Frankly, planning a release without even an idea of what is going into it
seems like a waste of time to me.

I didn't ask these questions to try to squash such a release; I don't think
they're particularly difficult to figure out. Just curious what the release
notes would look like (as a user, this is what I would care about). I don't
think I'm alone.

On Mon, Oct 8, 2018, 19:33 Christopher  wrote:


I don't know the answers to these questions. I just want to put a
stake in the ground before the Accumulo Summit, so we have a basis for
evaluation and testing, and answering some of these unknowns.
On Mon, Oct 8, 2018 at 11:28 AM Josh Elser  wrote:


I would like to know what the scope of 2.0 is. Specifically:

* What's new in this 2.0 alpha that people that is driving the release?
* Is there anything else expected to land post-alpha/pre-GA?

On 10/6/18 1:36 PM, Sean Busbey wrote:

yes alphas please. Do we want to talk about expectations on time
between alpha releases? What kind of criteria for beta or GA?

a *lot* has changed in the 2.0 codebase.
On Sat, Oct 6, 2018 at 11:45 AM Ed Coleman 

wrote:


+1

In addition to the reasons stated by Christopher, I think that it

also provides a clearer signal to earlier adopters that the public API
*may* change before the formal release. With a formal release candidate,

I

interpret that it signals that only bug-fixes would occur up and until

the

formal release.


With the length of time that we take between minor and patch

releases, the even longer time that it takes the customer base to upgrade
and development cost that we have supporting multiple branches, taking

some

extra time now to solicit feedback seems prudent. While the specifics and
implications of semver are clear, sometimes it seems that there is
additional weight and additional perceived risk when changing major
versions, an alpha version preserves our flexibility while still moving
forward.


Ed Coleman

-Original Message-
From: Christopher [mailto:ctubb...@apache.org]
Sent: Saturday, October 06, 2018 12:28 AM
To: accumulo-dev 
Subject: [DISCUSS] 2.0.0-alpha?

Hi Accumulo devs,

I'm thinking about initiating a vote next week for a 2.0.0-alpha

release, so we can have an official ASF release (albeit without the usual
stability expectations as a normal release) to be available for the
upcoming Accumulo Summit.


An alpha version would signal our progress towards 2.0.0 final,

serve

as a basis for testing, and give us something to share with a wider
audience to solicit feedback on the API, configuration, and module

changes.

Of course, it would still have to meet ASF release requirements... like
licensing and stuff, and it should essentially work (so people can

actually

run tests), but in an alpha release, we could tolerate flaws we wouldn't

in

a final release.


Ideally, I would have preferred a 2.0.0 final at this point in the

year, but I think it needs more testing.


Does an alpha release next week seem reasonable to you?

Christopher












Re: [DISCUSS] 2.0.0-alpha?

2018-10-08 Thread Josh Elser
Frankly, planning a release without even an idea of what is going into it
seems like a waste of time to me.

I didn't ask these questions to try to squash such a release; I don't think
they're particularly difficult to figure out. Just curious what the release
notes would look like (as a user, this is what I would care about). I don't
think I'm alone.

On Mon, Oct 8, 2018, 19:33 Christopher  wrote:

> I don't know the answers to these questions. I just want to put a
> stake in the ground before the Accumulo Summit, so we have a basis for
> evaluation and testing, and answering some of these unknowns.
> On Mon, Oct 8, 2018 at 11:28 AM Josh Elser  wrote:
> >
> > I would like to know what the scope of 2.0 is. Specifically:
> >
> > * What's new in this 2.0 alpha that people that is driving the release?
> > * Is there anything else expected to land post-alpha/pre-GA?
> >
> > On 10/6/18 1:36 PM, Sean Busbey wrote:
> > > yes alphas please. Do we want to talk about expectations on time
> > > between alpha releases? What kind of criteria for beta or GA?
> > >
> > > a *lot* has changed in the 2.0 codebase.
> > > On Sat, Oct 6, 2018 at 11:45 AM Ed Coleman  wrote:
> > >>
> > >> +1
> > >>
> > >> In addition to the reasons stated by Christopher, I think that it
> also provides a clearer signal to earlier adopters that the public API
> *may* change before the formal release. With a formal release candidate, I
> interpret that it signals that only bug-fixes would occur up and until the
> formal release.
> > >>
> > >> With the length of time that we take between minor and patch
> releases, the even longer time that it takes the customer base to upgrade
> and development cost that we have supporting multiple branches, taking some
> extra time now to solicit feedback seems prudent. While the specifics and
> implications of semver are clear, sometimes it seems that there is
> additional weight and additional perceived risk when changing major
> versions, an alpha version preserves our flexibility while still moving
> forward.
> > >>
> > >> Ed Coleman
> > >>
> > >> -Original Message-
> > >> From: Christopher [mailto:ctubb...@apache.org]
> > >> Sent: Saturday, October 06, 2018 12:28 AM
> > >> To: accumulo-dev 
> > >> Subject: [DISCUSS] 2.0.0-alpha?
> > >>
> > >> Hi Accumulo devs,
> > >>
> > >> I'm thinking about initiating a vote next week for a 2.0.0-alpha
> release, so we can have an official ASF release (albeit without the usual
> stability expectations as a normal release) to be available for the
> upcoming Accumulo Summit.
> > >>
> > >> An alpha version would signal our progress towards 2.0.0 final, serve
> as a basis for testing, and give us something to share with a wider
> audience to solicit feedback on the API, configuration, and module changes.
> Of course, it would still have to meet ASF release requirements... like
> licensing and stuff, and it should essentially work (so people can actually
> run tests), but in an alpha release, we could tolerate flaws we wouldn't in
> a final release.
> > >>
> > >> Ideally, I would have preferred a 2.0.0 final at this point in the
> year, but I think it needs more testing.
> > >>
> > >> Does an alpha release next week seem reasonable to you?
> > >>
> > >> Christopher
> > >>
> > >
> > >
>


Re: [DISCUSS] 2.0.0-alpha?

2018-10-08 Thread Josh Elser

I would like to know what the scope of 2.0 is. Specifically:

* What's new in this 2.0 alpha that people that is driving the release?
* Is there anything else expected to land post-alpha/pre-GA?

On 10/6/18 1:36 PM, Sean Busbey wrote:

yes alphas please. Do we want to talk about expectations on time
between alpha releases? What kind of criteria for beta or GA?

a *lot* has changed in the 2.0 codebase.
On Sat, Oct 6, 2018 at 11:45 AM Ed Coleman  wrote:


+1

In addition to the reasons stated by Christopher, I think that it also provides 
a clearer signal to earlier adopters that the public API *may* change before 
the formal release. With a formal release candidate, I interpret that it 
signals that only bug-fixes would occur up and until the formal release.

With the length of time that we take between minor and patch releases, the even 
longer time that it takes the customer base to upgrade and development cost 
that we have supporting multiple branches, taking some extra time now to 
solicit feedback seems prudent. While the specifics and implications of semver 
are clear, sometimes it seems that there is additional weight and additional 
perceived risk when changing major versions, an alpha version preserves our 
flexibility while still moving forward.

Ed Coleman

-Original Message-
From: Christopher [mailto:ctubb...@apache.org]
Sent: Saturday, October 06, 2018 12:28 AM
To: accumulo-dev 
Subject: [DISCUSS] 2.0.0-alpha?

Hi Accumulo devs,

I'm thinking about initiating a vote next week for a 2.0.0-alpha release, so we 
can have an official ASF release (albeit without the usual stability 
expectations as a normal release) to be available for the upcoming Accumulo 
Summit.

An alpha version would signal our progress towards 2.0.0 final, serve as a 
basis for testing, and give us something to share with a wider audience to 
solicit feedback on the API, configuration, and module changes. Of course, it 
would still have to meet ASF release requirements... like licensing and stuff, 
and it should essentially work (so people can actually run tests), but in an 
alpha release, we could tolerate flaws we wouldn't in a final release.

Ideally, I would have preferred a 2.0.0 final at this point in the year, but I 
think it needs more testing.

Does an alpha release next week seem reasonable to you?

Christopher






Re: LoadPlanTest unit test failure on master

2018-09-14 Thread Josh Elser

Nevermind, Christopher has already fixed this it seems..

On 9/14/18 10:10 AM, Josh Elser wrote:

I'll tag you in a PR. Trivial fix. Don't sweat it :)

On 9/14/18 10:09 AM, Keith Turner wrote:

I will look into it.  This branch was outstanding for a long time.
Yesterday I merged it, resolved conflicts, and then only ran mvn
compile.  I should have ran mvn verify again and waited.

On Fri, Sep 14, 2018 at 9:11 AM, Josh Elser  wrote:

Color me surprised: this fails for me out of the box on
7ef140ec40c3768859b848350db8c6d6d20f7a56

[INFO] Running org.apache.accumulo.core.data.LoadPlanTest
[ERROR] Tests run: 6, Failures: 1, Errors: 0, Skipped: 0, Time elapsed:
0.168 s <<< FAILURE! - in org.apache.accumulo.core.data.LoadPlanTest
[ERROR] testTypes(org.apache.accumulo.core.data.LoadPlanTest)  Time 
elapsed:

0.074 s  <<< FAILURE!
java.lang.AssertionError: expected:<[f5.rf:TABLET:yyy:null,
f3.rf:DATA:368:479, f2.rf:DATA:abc:def, f7.rf:TABLET:www:null,
f9.rf:TABLET:xxx:null, fb.rf:TABLET:heg:klt, fd.rf:TABLET:null:null,
f6.rf:TABLET:null:bbb, fc.rf:TABLET:agt:ctt, f8.rf:TABLET:null:ccc,
f1.rf:DATA:1112:1145, f4.rf:TABLET:null:aaa, fa.rf:TABLET:1138:1147]> 
but

was:<[f5.rf:TABLE:yyy:null, f4.rf:TABLE:null:aaa, f9.rf:TABLE:xxx:null,
f2.rf:FILE:abc:def, f3.rf:FILE:368:479, f7.rf:TABLE:www:null,
f8.rf:TABLE:null:ccc, fa.rf:TABLE:1138:1147, fd.rf:TABLE:null:null,
f1.rf:FILE:1112:1145, fc.rf:TABLE:agt:ctt, f6.rf:TABLE:null:bbb,
fb.rf:TABLE:heg:klt]>
 at
org.apache.accumulo.core.data.LoadPlanTest.testTypes(LoadPlanTest.java:93) 



@Keith, I see that HEAD is a commit which touches LoadPlanTest. 
Haven't yet

dug into the test (seems to be new code since I've touched Accumulo),
lobbing this as a softball for now and will investigate as time allows.


Re: LoadPlanTest unit test failure on master

2018-09-14 Thread Josh Elser

I'll tag you in a PR. Trivial fix. Don't sweat it :)

On 9/14/18 10:09 AM, Keith Turner wrote:

I will look into it.  This branch was outstanding for a long time.
Yesterday I merged it, resolved conflicts, and then only ran mvn
compile.  I should have ran mvn verify again and waited.

On Fri, Sep 14, 2018 at 9:11 AM, Josh Elser  wrote:

Color me surprised: this fails for me out of the box on
7ef140ec40c3768859b848350db8c6d6d20f7a56

[INFO] Running org.apache.accumulo.core.data.LoadPlanTest
[ERROR] Tests run: 6, Failures: 1, Errors: 0, Skipped: 0, Time elapsed:
0.168 s <<< FAILURE! - in org.apache.accumulo.core.data.LoadPlanTest
[ERROR] testTypes(org.apache.accumulo.core.data.LoadPlanTest)  Time elapsed:
0.074 s  <<< FAILURE!
java.lang.AssertionError: expected:<[f5.rf:TABLET:yyy:null,
f3.rf:DATA:368:479, f2.rf:DATA:abc:def, f7.rf:TABLET:www:null,
f9.rf:TABLET:xxx:null, fb.rf:TABLET:heg:klt, fd.rf:TABLET:null:null,
f6.rf:TABLET:null:bbb, fc.rf:TABLET:agt:ctt, f8.rf:TABLET:null:ccc,
f1.rf:DATA:1112:1145, f4.rf:TABLET:null:aaa, fa.rf:TABLET:1138:1147]> but
was:<[f5.rf:TABLE:yyy:null, f4.rf:TABLE:null:aaa, f9.rf:TABLE:xxx:null,
f2.rf:FILE:abc:def, f3.rf:FILE:368:479, f7.rf:TABLE:www:null,
f8.rf:TABLE:null:ccc, fa.rf:TABLE:1138:1147, fd.rf:TABLE:null:null,
f1.rf:FILE:1112:1145, fc.rf:TABLE:agt:ctt, f6.rf:TABLE:null:bbb,
fb.rf:TABLE:heg:klt]>
 at
org.apache.accumulo.core.data.LoadPlanTest.testTypes(LoadPlanTest.java:93)

@Keith, I see that HEAD is a commit which touches LoadPlanTest. Haven't yet
dug into the test (seems to be new code since I've touched Accumulo),
lobbing this as a softball for now and will investigate as time allows.


Re: LoadPlanTest unit test failure on master

2018-09-14 Thread Josh Elser

Nevermind, this is silly:

s/TABLET/TABLE/

On 9/14/18 9:11 AM, Josh Elser wrote:
Color me surprised: this fails for me out of the box on 
7ef140ec40c3768859b848350db8c6d6d20f7a56


[INFO] Running org.apache.accumulo.core.data.LoadPlanTest
[ERROR] Tests run: 6, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 
0.168 s <<< FAILURE! - in org.apache.accumulo.core.data.LoadPlanTest
[ERROR] testTypes(org.apache.accumulo.core.data.LoadPlanTest)  Time 
elapsed: 0.074 s  <<< FAILURE!
java.lang.AssertionError: expected:<[f5.rf:TABLET:yyy:null, 
f3.rf:DATA:368:479, f2.rf:DATA:abc:def, f7.rf:TABLET:www:null, 
f9.rf:TABLET:xxx:null, fb.rf:TABLET:heg:klt, fd.rf:TABLET:null:null, 
f6.rf:TABLET:null:bbb, fc.rf:TABLET:agt:ctt, f8.rf:TABLET:null:ccc, 
f1.rf:DATA:1112:1145, f4.rf:TABLET:null:aaa, fa.rf:TABLET:1138:1147]> 
but was:<[f5.rf:TABLE:yyy:null, f4.rf:TABLE:null:aaa, 
f9.rf:TABLE:xxx:null, f2.rf:FILE:abc:def, f3.rf:FILE:368:479, 
f7.rf:TABLE:www:null, f8.rf:TABLE:null:ccc, fa.rf:TABLE:1138:1147, 
fd.rf:TABLE:null:null, f1.rf:FILE:1112:1145, fc.rf:TABLE:agt:ctt, 
f6.rf:TABLE:null:bbb, fb.rf:TABLE:heg:klt]>
 at 
org.apache.accumulo.core.data.LoadPlanTest.testTypes(LoadPlanTest.java:93)


@Keith, I see that HEAD is a commit which touches LoadPlanTest. Haven't 
yet dug into the test (seems to be new code since I've touched 
Accumulo), lobbing this as a softball for now and will investigate as 
time allows.


LoadPlanTest unit test failure on master

2018-09-14 Thread Josh Elser
Color me surprised: this fails for me out of the box on 
7ef140ec40c3768859b848350db8c6d6d20f7a56


[INFO] Running org.apache.accumulo.core.data.LoadPlanTest
[ERROR] Tests run: 6, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 
0.168 s <<< FAILURE! - in org.apache.accumulo.core.data.LoadPlanTest
[ERROR] testTypes(org.apache.accumulo.core.data.LoadPlanTest)  Time 
elapsed: 0.074 s  <<< FAILURE!
java.lang.AssertionError: expected:<[f5.rf:TABLET:yyy:null, 
f3.rf:DATA:368:479, f2.rf:DATA:abc:def, f7.rf:TABLET:www:null, 
f9.rf:TABLET:xxx:null, fb.rf:TABLET:heg:klt, fd.rf:TABLET:null:null, 
f6.rf:TABLET:null:bbb, fc.rf:TABLET:agt:ctt, f8.rf:TABLET:null:ccc, 
f1.rf:DATA:1112:1145, f4.rf:TABLET:null:aaa, fa.rf:TABLET:1138:1147]> 
but was:<[f5.rf:TABLE:yyy:null, f4.rf:TABLE:null:aaa, 
f9.rf:TABLE:xxx:null, f2.rf:FILE:abc:def, f3.rf:FILE:368:479, 
f7.rf:TABLE:www:null, f8.rf:TABLE:null:ccc, fa.rf:TABLE:1138:1147, 
fd.rf:TABLE:null:null, f1.rf:FILE:1112:1145, fc.rf:TABLE:agt:ctt, 
f6.rf:TABLE:null:bbb, fb.rf:TABLE:heg:klt]>
	at 
org.apache.accumulo.core.data.LoadPlanTest.testTypes(LoadPlanTest.java:93)


@Keith, I see that HEAD is a commit which touches LoadPlanTest. Haven't 
yet dug into the test (seems to be new code since I've touched 
Accumulo), lobbing this as a softball for now and will investigate as 
time allows.


Re: [DRAFT] [REPORT] Apache Accumulo - July 2018

2018-07-09 Thread Josh Elser




On 7/9/18 4:54 PM, Michael Wall wrote:

Josh, I am not clear on what you are suggesting for another action item.
Are you suggesting a pass over contributors to add see if anyone should be
invited to become a committer or are you suggesting we revisit that every
committer becomes a PMC member?


If we say that we have no added any new committers/PMC members in a long 
period of time, the board will most assuredly say "Have you looked at 
your contributors to see if you should invite some to be committers?"


I was trying to suggest that we proactively tell them "we know we need 
to see if there are contributors to invite to be committers" in order to 
save that middle step. No need to do this -- just trying to be helpful 
based on what I see over and over again from the board :)


Re: [DRAFT] [REPORT] Apache Accumulo - July 2018

2018-07-05 Thread Josh Elser
+1 definitely. Especially since the organizers went through the proper 
TM approval steps.


I'd suggest expanding that, in addition to no new committers/PMC 
members, to include an action item. e.g. Should we make a pass over 
contributors? Or, is participation "constant" (c=pmc makes this a bit 
easier to put into words ;))


On 7/1/18 5:06 PM, Mike Drob wrote:

Worth mentioning upcoming summit?

On Sun, Jul 1, 2018, 3:54 PM Michael Wall  wrote:


The Apache Accumulo PMC decided to draft its quarterly board
reports on the dev list. Here is a draft of our report which is due
by Wednesday, Jul 11, 1 week before the board meeting on
Wednesday, Jul 18. Please let me know if you have any suggestions.

I am a little earlier with this one since I will be on vacation 5-15 Jul.
My plan is to submit this report sometime on Mon, Jul 9.

Mike

--

## Description:

  - The Apache Accumulo sorted, distributed key/value
  store is a robust, scalable, high performance data storage system that
  features cell-based access control and customizable server-side
  processing.  It is based on Google's BigTable design and is built on
  top of Apache Hadoop, Zookeeper, and Thrift.

## Issues:

  - There are no issues requiring board attention at this time.

## Activity:

  - There were 2 new releases, Accumulo 1.9.0 and Accumulo 1.9.1 since
  the last report.  The 1.9.0 release [1] had a critical bug in the Write
  Ahead Log (WAL) process that is fixed in 1.9.1 [2].
  - There were no new committers since the last report.  All committers
  are also PMC members.
  - The PMC decided to switch to using github issue for the project and all
  subprojects [3], which is why the Jira activity dropped off.  Github
issues
  and pull request statistics are included below.
  - Another bug has been found in the WAL process and a 1.9.2 is in the
works.

## Health report:

  - The project remains healthy.  Activity levels on mailing lists, git
  and JIRA remain constant.

## PMC changes:

  - Currently 34 PMC members.
  - No new PMC members added in the last 3 months
  - Last PMC addition was Nick Felts on Thu Mar 22 2018

## Committer base changes:

  - See PMC changes, all committers are PMC members currently.

## Releases:

  - accumulo-1.9.0 was released on Tue Apr 17 2018
  - accumulo-1.9.1 was released on Sun May 13 2018

## Mailing list activity:

  - Nothing significant in the figures

## Issue activity:

  - 25 issued open and 19 closed in the last 3 months [4]
  - 10 pull requests opened and 58 closed in the last 3 months [5]

[1]: http://accumulo.apache.org/release/accumulo-1.9.0/
[2]: http://accumulo.apache.org/release/accumulo-1.9.1
[3]:
http://accumulo.apache.org/blog/2018/03/16/moving-to-github-issues.html
[4]:

https://github.com/apache/accumulo/issues?utf8=%E2%9C%93=is%3Aissue+created%3A%3E2018-04-18+
[5]:

https://github.com/apache/accumulo/pulls?utf8=%E2%9C%93=is%3Apr+is%3Aopen+created%3A%3E2018-04-18+





Re: [DISCUSS] Draft release timeline for 2.0.0

2018-06-12 Thread Josh Elser




On 6/12/18 1:20 AM, Christopher wrote:

On Mon, Jun 11, 2018 at 10:46 PM Josh Elser  wrote:


I'm just trying to point out the fallacy of meeting deadlines when the
criteria for "success" is undefined.



Why? I proposed the timeline to solicit opinions on it. Use whatever
subjective criteria you want to inform your own. If you have criteria that
you think won't be satisfied within that timeline, then raise them for
discussion.


Again, I am stating that a timeline with no recognition of what work 
needs to be done is silly. Yes, you can draw a line in the sand for when 
you want work to be done, but that's ineffective in making an actionable 
feature complete date.


If you want the date to be meaningful, you need to understand what work 
actually _has_ to be done and structure the date around that. Does this 
make sense?



If Jira is overburdened, move everything out and have people move things

back. We have multiple tools -- we should at least have one in use.
Otherwise, this just seems like there are decisions happening behind the
scenes.



You lost me. Every release, we triage (finish, reject, or bump) open
issues; nobody's done that yet for 2.0. That's all I was talking about with
regard to the issue tracker noise.


I thought you were saying that there were too many open issues on Jira 
to glean any information on outstanding work from it. I was trying to 
give a suggestion about how to move past that.


Re: [DISCUSS] Draft release timeline for 2.0.0

2018-06-12 Thread Josh Elser




On 6/12/18 10:36 AM, Keith Turner wrote:

On Mon, Jun 11, 2018 at 10:46 PM, Josh Elser  wrote:

I'm just trying to point out the fallacy of meeting deadlines when the
criteria for "success" is undefined.

If Jira is overburdened, move everything out and have people move things
back. We have multiple tools -- we should at least have one in use.
Otherwise, this just seems like there are decisions happening behind the
scenes.

To communicate what we would like to see in 2.0.0, I propose opening a
Github issue, tagging it as 2.0.0, and marking it as a blocker.  We
can always triage and discuss the open blockers it later in the
summer.



That'd be great.


Re: [DISCUSS] Draft release timeline for 2.0.0

2018-06-11 Thread Josh Elser
I'm just trying to point out the fallacy of meeting deadlines when the
criteria for "success" is undefined.

If Jira is overburdened, move everything out and have people move things
back. We have multiple tools -- we should at least have one in use.
Otherwise, this just seems like there are decisions happening behind the
scenes.

On Mon, Jun 11, 2018, 7:52 PM Christopher  wrote:

> I do not expect that page to be a complete or final set of features right
> now, but it's probably better than the issue tracker is (because of all the
> noise of old issues). Part of the goal of this thread was to motivate
> people to start finalizing that set over the next few weeks as they triage
> open issues and think about what they can realistically finish in the
> timeline we establish. The hope is that the page will become more and more
> complete as head more strongly towards this release.
>
> As for the timeline, I have no problem moving the time table up if we get a
> bit further along and realize we're in a good place to release. I just
> don't like the pressure of unrealistically short timelines, and I know that
> personally, my summer is going to be very busy regardless. Initially, I was
> hoping we could release around September 1st... but then I figured add a
> month for dedicated testing and documentation might be nice... and we'd
> still release before the summit.
>
>
> On Mon, Jun 11, 2018 at 6:36 PM Josh Elser  wrote:
>
> > Based on that, https://issues.apache.org/jira/browse/ACCUMULO-4733 is
> > the only thing outstanding (and just one question at that).
> >
> > Mid/late August seems like a long time until feature-complete for
> > essentially a no-op of work :)
> >
> > On 6/11/18 5:07 PM, Christopher wrote:
> > > I believe those are being maintained in the draft release notes at
> > > https://accumulo.apache.org/release/accumulo-2.0.0/
> > >
> > > On Mon, Jun 11, 2018 at 5:02 PM Josh Elser  wrote:
> > >
> > >> What are the current 2.0.0 features? (Outstanding and completed)
> > >>
> > >> On 6/11/18 4:35 PM, Christopher wrote:
> > >>> Hi Accumulo Devs,
> > >>>
> > >>> I've been thinking about the 2.0.0 release timeline. I was thinking
> > >>> something like this milestone timeline:
> > >>>
> > >>> Feature Complete : mid-late August
> > >>> Dedicated Testing, Documentation, and release voting : all of
> September
> > >>> Final release : October 1st
> > >>>
> > >>> This schedule would make 2.0.0 available for the Accumulo Summit
> coming
> > >> up
> > >>> in October, with a few weeks to spare.
> > >>>
> > >>
> > >
> >
>


Re: [DISCUSS] Draft release timeline for 2.0.0

2018-06-11 Thread Josh Elser
Based on that, https://issues.apache.org/jira/browse/ACCUMULO-4733 is 
the only thing outstanding (and just one question at that).


Mid/late August seems like a long time until feature-complete for 
essentially a no-op of work :)


On 6/11/18 5:07 PM, Christopher wrote:

I believe those are being maintained in the draft release notes at
https://accumulo.apache.org/release/accumulo-2.0.0/

On Mon, Jun 11, 2018 at 5:02 PM Josh Elser  wrote:


What are the current 2.0.0 features? (Outstanding and completed)

On 6/11/18 4:35 PM, Christopher wrote:

Hi Accumulo Devs,

I've been thinking about the 2.0.0 release timeline. I was thinking
something like this milestone timeline:

Feature Complete : mid-late August
Dedicated Testing, Documentation, and release voting : all of September
Final release : October 1st

This schedule would make 2.0.0 available for the Accumulo Summit coming

up

in October, with a few weeks to spare.







Re: Number of entries

2018-06-04 Thread Josh Elser

Hi Marcus,

Via what means?

This information is present on the Accumulo Monitor UI already, lagging 
only by a compaction happening on relevant Tablets. You can easily look 
at this data for just about any installation.


If via code, I don't believe there is public API (stable) for requesting 
the table sizes, but the Monitor pulls this data from the 
accumulo:metadata table. You could do the same. The accumulo:metadata 
table has a reference to each file contained by a table, and with it the 
number of entries in that file. It's a simple calculation to compute the 
number of entries for a table once you can extract the number of entries 
for a single tablet.


On 6/4/18 11:48 AM, Mauro Schneider wrote:

Hello

How to find out a number of entries of table with a ton of data in the
Accumulo ?




Mauro Schneider



Re: Java (eventually) dropping Serialization

2018-05-30 Thread Josh Elser
Also see the xolstice protobuf-maven-plugin which marries up nicely with 
the OS properties.


On 5/30/18 3:40 PM, Brian Loss wrote:

If I understand what you are asking correctly, os-maven-plugin [1] is what you 
are looking for. It will determine the os name and arch (it puts them in 
properties os.detected.name and os.detected.arch) and you can use those values 
to declare the right executableDependency [2] for the exec-maven-plugin.

[1] 
http://search.maven.org/#search%7Cga%7C1%7Cg%3A%22kr.motd.maven%22%20AND%20a%3A%22os-maven-plugin%22
[2] 
https://www.mojohaus.org/exec-maven-plugin/examples/example-exec-using-executabledependency.html

From: Christopher [ctubb...@apache.org]
Sent: Wednesday, May 30, 2018 3:17 PM
To: dev@accumulo.apache.org
Subject: Re: Java (eventually) dropping Serialization

I wasn't aware they were publishing pre-built binaries for various
platforms to Maven Central. That could be quite useful if we could
automatically download the correct one during the Maven build, and use that
to generate the code. It could still be problematic if they are dynamically
linked to specific version ranges of system libraries, but I'd be
interested in trying. Do you know if that tooling already exists as a Maven
plugin or similar?



Re: Java (eventually) dropping Serialization

2018-05-30 Thread Josh Elser




On 5/30/18 12:41 PM, Christopher wrote:

On Wed, May 30, 2018 at 11:59 AM Josh Elser  wrote:


On 5/30/18 9:08 AM, Keith Turner wrote:

On Wed, May 30, 2018 at 12:16 AM, Christopher 

wrote:

I thought this was interesting:


https://www.infoworld.com/article/3275924/java/oracle-plans-to-dump-risky-java-serialization.html


If the long-term plan is to remove serialization from Java classes (in
favor of a lightweight, possibly pluggable, "Records" serialization
framework), we should begin thinking about how we use serialization in
Accumulo's code today. At the very least, we should try to avoid any
reliance on it in any future persistence of objects in Accumulo. If we

see

an opportunity to remove it in our current code anywhere, it might be

worth

spending the time to do follow through with such a change.

Of course, this is probably going to be a *very* long time before it is
actually dropped from Java, but it's not going to hurt to start thinking
about it now.

(Accumulo uses Java serialization for storing FaTE transaction

information,

and perhaps elsewhere.)


We currently do not support FaTE transactions across minor versions.
The upgrade code checks for any outstanding FaTE transactions.  So
this makes it easier to upgrade on a minor version.  I would like to
see FaTE use a human readable format like Json because it would make
debugging easier.


I'd strongly suggest against using JSON as you it forces the application
to know how to handle drift in "schema". It would be nice to avoid the
need to flush the outstanding fate txns on upgrade.

If you just want a JSON-ish way to look at the data, I'd suggest moving
over to protobuf3 and check out the support they have around JSON.

https://developers.google.com/protocol-buffers/docs/proto3#json



Protobuf certainly has better support for schemas... but I like the
simplicity of using JSON directly and managing our own schema for FaTE to
reduce dependencies. (Also, protobuf does not have a native Java compiler,
AFAICT, which makes it a pain, similar to thrift, for portable code
generation.) Whichever we choose, though, we've got plenty of time to
hammer out these pros and cons, and experiment.


Actually, you don't need to do a custom compiler installation for 
Protobuf3 on the majority of arches as there are compilers available via 
Maven central for protobuf on x86/64 and ppc. This is a non-issue for 
the majority of platforms.


http://search.maven.org/#search%7Cga%7C1%7Ca%3A%22protoc%22

Managing your own schema is silly when there are tools whose specific 
purpose in creation was "[schema management is hard on its own and we 
can make it easier with guiderails]". Smells like "not-invented-here" to me.


Re: Java (eventually) dropping Serialization

2018-05-30 Thread Josh Elser

On 5/30/18 9:08 AM, Keith Turner wrote:

On Wed, May 30, 2018 at 12:16 AM, Christopher  wrote:

I thought this was interesting:
https://www.infoworld.com/article/3275924/java/oracle-plans-to-dump-risky-java-serialization.html

If the long-term plan is to remove serialization from Java classes (in
favor of a lightweight, possibly pluggable, "Records" serialization
framework), we should begin thinking about how we use serialization in
Accumulo's code today. At the very least, we should try to avoid any
reliance on it in any future persistence of objects in Accumulo. If we see
an opportunity to remove it in our current code anywhere, it might be worth
spending the time to do follow through with such a change.

Of course, this is probably going to be a *very* long time before it is
actually dropped from Java, but it's not going to hurt to start thinking
about it now.

(Accumulo uses Java serialization for storing FaTE transaction information,
and perhaps elsewhere.)


We currently do not support FaTE transactions across minor versions.
The upgrade code checks for any outstanding FaTE transactions.  So
this makes it easier to upgrade on a minor version.  I would like to
see FaTE use a human readable format like Json because it would make
debugging easier.


I'd strongly suggest against using JSON as you it forces the application 
to know how to handle drift in "schema". It would be nice to avoid the 
need to flush the outstanding fate txns on upgrade.


If you just want a JSON-ish way to look at the data, I'd suggest moving 
over to protobuf3 and check out the support they have around JSON.


https://developers.google.com/protocol-buffers/docs/proto3#json


Re: Use of Flush Table Operation

2018-04-23 Thread Josh Elser

Can you give some more context?

Strikes me as strange to be wanting to change a method which we want to 
remove (being deprecated).


On 4/23/18 6:05 PM, Mike Miller wrote:

Quick Survey:

Does your project use the flush Table Operation in Accumulo? I am looking
into changing the default behavior of the deprecated flush method[1] and
was wondering if and how it is currently being used.

Any response would be helpful.  Thanks!

[1]
https://github.com/apache/accumulo/blob/master/core/src/main/java/org/apache/accumulo/core/client/admin/TableOperations.java#L471



Re: [DRAFT] [REPORT] Apache Accumulo - March 2018

2018-04-09 Thread Josh Elser



On 4/9/18 12:40 PM, Michael Wall wrote:

Thanks for reading this Josh.

1 - In my first draft I had no information about Tony but I felt like that
was going to prompt questions.  I agree contributor is more relevant but he
is not listed on the people page.  Do we think that matters?


I don't think that matters. The contributors page is code-contributors, 
but the board certainly knows that such lists are not all-encompassing.



2 - What were you thinking on the "what's next"?  Maybe it is time for us
to really define a roadmap, not sure there is time before this is due on
Wed though.


My omission of specifics here was intentional ;) I don't have the time 
to devote meaningful cycles here, but I would agree with you that 
something should probably be documented. If nothing else, acknowledging 
that a roadmap is necessary is a sufficient improvement!


Re: [DRAFT] [REPORT] Apache Accumulo - March 2018

2018-04-09 Thread Josh Elser

Two minor suggestions:

* Strike the "A member of the Apache Nifi PMC" part. The fact that Tony 
was NiFi PMC member isn't really relevant to the request, is it? IMO, if 
anything, it's more relevant that Tony is a contributor to Accumulo.
* There isn't any discussion about what is coming next -- we have the 
1.7.4 release, but no details about what's in the pipeline.


On 4/7/18 10:19 AM, Michael Wall wrote:

The Apache Accumulo PMC decided to draft its quarterly board
reports on the dev list. Here is a draft of our report which is due
by Wednesday, Mar 11, 1 week before the board meeting on
Wednesday, Mar 18. Please let me know if you have any suggestions.
I plan to submit it late on the 10th.

Mike

--

## Description:
  - The Apache Accumulo sorted, distributed key/value
  store is a robust, scalable, high performance data storage system that
  features cell-based access control and customizable server-side
  processing.  It is based on Google's BigTable design and is built on
  top of Apache Hadoop, Zookeeper, and Thrift.

## Issues:
  - There are no issues requiring board attention at this time.

## Activity:
  - There was 1 new release, Accumulo 1.7.4, since the last report.
  - There were 4 new committers since the last report.  All committers
  are also PMC members.
  - The PMC decided to switch to using github issue for the project
  and all subprojects.
  - The PMC decided to drop support for Hadoop 2 in the Accumulo 2.0.
  - A member of the Apache Nifi PMC requested permission to use the
Accumulo logo on t-shirts and/or stickers to promote projects he uses.
VP Brand Management and the PMC had no objections.

## Health report:
  - The project remains healthy.  Activity levels on mailing lists, git
  and JIRA remain constant.

## PMC changes:

  - Currently 34 PMC members.
  - New PMC members:
 - Adam J. Shook was added to the PMC on Wed Jan 24 2018
 - Mark Owens was added to the PMC on Tue Mar 20 2018
 - Luis Tavarez was added to the PMC on Tue Mar 20 2018
 - Nick Felts was added to the PMC on Thu Mar 22 2018

## Committer base changes:

  - Currently 34 committers.
  - New commmitters:
 - Adam J. Shook was added as a committer on Wed Jan 24 2018
 - Mark Owens was added as a committer on Wed Mar 21 2018
 - Luis Tavarez was added as a committer on Wed Mar 21 2018
 - Nick Felts was added as a committer on Sat Mar 24 2018

## Releases:

  - accumulo-1.7.4 was released on Thu Mar 22 2018

## Mailing list activity:

  - Nothing significant in the figures

## JIRA activity:

  - 65 JIRA tickets created in the last 3 months
  - 101 JIRA tickets closed/resolved in the last 3 months



Re: [VOTE] Accumulo 1.7.4-rc1

2018-03-23 Thread Josh Elser
Yup, that's the only thing that would have come to my mind (hadoop bugs 
that have been long-fixed).


On 3/22/18 10:07 PM, Christopher wrote:

Yeah, I vaguely remember that now. It definitely seems to be the problem
here. I believe it is one of the reasons we moved to 2.6.4 in 1.8+

I had forgotten about that. The 1.7 branch was still building with Hadoop
2.2.0 and I now do my testing with jdk8 only, so that's why it kept being a
problem for me.

Thanks for the info. I spent way too much time this week looking into this
when all I needed to do was test with a newer version of Hadoop (or an
older version of jdk), but at least I now know what the problem was.

On Thu, Mar 22, 2018, 21:15 Billie Rinaldi <billie.rina...@gmail.com> wrote:


On Thu, Mar 22, 2018 at 2:31 PM, Christopher <ctubb...@apache.org> wrote:


Josh, I know you said you didn't have much time, but just in case you

get a

moment: do you know why `UserGroupInformation.isLoginKeytabBased()` might
be false on the server side? This seems to be the root cause of the
problems.



I recall running into HADOOP-10786 a while ago ("in java 8 isKeyTab is
always false given the current UGI implementation"). Not sure if that would
be relevant here.




https://github.com/apache/accumulo/blob/b0016c3ca36e15ee4bdde727ea5b6a
18597de0ff/core/src/main/java/org/apache/accumulo/core/rpc/
ThriftUtil.java#L383


On Thu, Mar 22, 2018 at 4:00 PM Josh Elser <els...@apache.org> wrote:


I don't have the time to look at these right now. There isn't much
special about how Accumulo uses Kerberos either. It's straightforward
use via SASL with Thrift. I haven't looked at it since it was passing
when I wrote it originally.

On 3/20/18 2:32 PM, Christopher wrote:

I'm currently looking at the KerberosRenewalIT failures that seem to

be

persisting across branches. From the logs, it looks like the accumulo
services are trying to do ticket-cache based login renewals, instead

of

keytab-based renewals. This has been a problematic test for me

before,

and as such, I've gotten into the habit of ignoring it, but since

I've

not been able to get it to work on reruns, and it fails nearly 100%

of

the time (if not 100%) for me now, I decided to take a closer look.

If

it is doing ticket-cache based renewals, that could indicate a bug in
the Kerberos authentication, and that would probably warrant a -1

from

me... but I will continue to investigate first.

Josh, you know more about the Kerberos stuff than anyone here, so if

you

have time/interest, I wouldn't mind getting your feedback on why this
test might be failing for me.

On Mon, Mar 19, 2018 at 3:44 PM Christopher <ctubb...@apache.org
<mailto:ctubb...@apache.org>> wrote:

 Accumulo Developers,

 Please consider the following candidate for Accumulo 1.7.4.

 Git Commit:
  b2a59189108d736729432e81b3d5717000c6b891
 Branch:
  1.7.4-rc1

 If this vote passes, a gpg-signed tag will be created using:
  git tag -f -m 'Apache Accumulo 1.7.4' -s rel/1.7.4 \
  b2a59189108d736729432e81b3d5717000c6b891

 Staging repo:


https://repository.apache.org/content/repositories/

orgapacheaccumulo-1068

 Source (official release artifact):


https://repository.apache.org/content/repositories/

orgapacheaccumulo-1068/org/apache/accumulo/accumulo/1.7.
4/accumulo-1.7.4-src.tar.gz

 Binary:


https://repository.apache.org/content/repositories/

orgapacheaccumulo-1068/org/apache/accumulo/accumulo/1.7.
4/accumulo-1.7.4-bin.tar.gz

 (Append ".sha1", ".md5", or ".asc" to download the signature/hash
 for a given artifact.)

 All artifacts were built and staged with:
  mvn release:prepare && mvn release:perform

 Signing keys are available at

https://www.apache.org/dist/accumulo/KEYS

 (Expected fingerprint: 8CC4F8A2B29C2B040F2B835D6F0CDAE700B6899D)

 Release notes (in progress) can be found at:
 https://accumulo.apache.org/release/accumulo-1.7.4/

 Please vote one of:
 [ ] +1 - I have verified and accept...
 [ ] +0 - I have reservations, but not strong enough to vote

against...

 [ ] -1 - Because..., I do not accept...
 ... these artifacts as the 1.7.4 release of Apache Accumulo.

 This vote will remain open until at least Thu Mar 22 20:00:00 UTC

2018

 (Thu Mar 22 16:00:00 EDT 2018 / Thu Mar 22 13:00:00 PDT 2018).
 Voting continues until the release manager sends an email closing
 the vote.

 Thanks!

 P.S. Hint: download the whole staging repo with
  wget -erobots=off -r -l inf -np -nH \


https://repository.apache.org/content/repositories/

orgapacheaccumulo-1068/

  # note the trailing slash is needed











Re: [VOTE] Accumulo 1.7.4-rc1

2018-03-22 Thread Josh Elser
I don't have the time to look at these right now. There isn't much 
special about how Accumulo uses Kerberos either. It's straightforward 
use via SASL with Thrift. I haven't looked at it since it was passing 
when I wrote it originally.


On 3/20/18 2:32 PM, Christopher wrote:
I'm currently looking at the KerberosRenewalIT failures that seem to be 
persisting across branches. From the logs, it looks like the accumulo 
services are trying to do ticket-cache based login renewals, instead of 
keytab-based renewals. This has been a problematic test for me before, 
and as such, I've gotten into the habit of ignoring it, but since I've 
not been able to get it to work on reruns, and it fails nearly 100% of 
the time (if not 100%) for me now, I decided to take a closer look. If 
it is doing ticket-cache based renewals, that could indicate a bug in 
the Kerberos authentication, and that would probably warrant a -1 from 
me... but I will continue to investigate first.


Josh, you know more about the Kerberos stuff than anyone here, so if you 
have time/interest, I wouldn't mind getting your feedback on why this 
test might be failing for me.


On Mon, Mar 19, 2018 at 3:44 PM Christopher > wrote:


Accumulo Developers,

Please consider the following candidate for Accumulo 1.7.4.

Git Commit:
     b2a59189108d736729432e81b3d5717000c6b891
Branch:
     1.7.4-rc1

If this vote passes, a gpg-signed tag will be created using:
     git tag -f -m 'Apache Accumulo 1.7.4' -s rel/1.7.4 \
     b2a59189108d736729432e81b3d5717000c6b891

Staging repo:
https://repository.apache.org/content/repositories/orgapacheaccumulo-1068
Source (official release artifact):

https://repository.apache.org/content/repositories/orgapacheaccumulo-1068/org/apache/accumulo/accumulo/1.7.4/accumulo-1.7.4-src.tar.gz
Binary:

https://repository.apache.org/content/repositories/orgapacheaccumulo-1068/org/apache/accumulo/accumulo/1.7.4/accumulo-1.7.4-bin.tar.gz
(Append ".sha1", ".md5", or ".asc" to download the signature/hash
for a given artifact.)

All artifacts were built and staged with:
     mvn release:prepare && mvn release:perform

Signing keys are available at https://www.apache.org/dist/accumulo/KEYS
(Expected fingerprint: 8CC4F8A2B29C2B040F2B835D6F0CDAE700B6899D)

Release notes (in progress) can be found at:
https://accumulo.apache.org/release/accumulo-1.7.4/

Please vote one of:
[ ] +1 - I have verified and accept...
[ ] +0 - I have reservations, but not strong enough to vote against...
[ ] -1 - Because..., I do not accept...
... these artifacts as the 1.7.4 release of Apache Accumulo.

This vote will remain open until at least Thu Mar 22 20:00:00 UTC 2018
(Thu Mar 22 16:00:00 EDT 2018 / Thu Mar 22 13:00:00 PDT 2018).
Voting continues until the release manager sends an email closing
the vote.

Thanks!

P.S. Hint: download the whole staging repo with
     wget -erobots=off -r -l inf -np -nH \
https://repository.apache.org/content/repositories/orgapacheaccumulo-1068/
     # note the trailing slash is needed



Re: [DISCUSS] Remove tracer service (not instrumentation)

2018-03-17 Thread Josh Elser

That was my expectation on how it would work

My +1 was the idea of moving the Tracer to a separate service and having 
clear instructions for how users get back to the current functionality 
(how these two repositories get deployed), *before* it's removed from 
core Accumulo.


This is because of the very clear testimony from a user about useful the 
feature was to them.


On 3/16/18 7:36 PM, Christopher wrote:

Would you both (Michael and Josh) be okay with moving it to a separate repo
within the Accumulo project rather than ripping it out and leaving it only
buried in git history?

On Fri, Mar 16, 2018 at 7:15 PM Josh Elser <els...@apache.org> wrote:


I think I'm in agreement with this subset of Mikes.

I like the idea long-term. The tracing service is "add-on", and can live
outside Accumulo.

I don't like the idea of moving the code out and taking away code which
is functional today. I am +1 on the idea of building the same
functionality outside of the core product. I am -1 on removing the
functionality in the core product until the replacement is ready (e.g.
clear docs for users covering how they get back to "normal").

On 3/16/18 6:49 PM, Michael Wall wrote:

Yeah, I get it.  That should have said "without a working example
alternative".  Something to make it as easy as possible for someone
currently using tracing to not loose functionality.

Thanks

On Fri, Mar 16, 2018, 18:38 Christopher <ctubb...@apache.org> wrote:


The alternative is to configure any of the other HTrace sinks which are
available. The current code for Accumulo's tracer service could even be
forked and supported as a separate sink to optionally use (but as I

said in

my original email, I think it'd be better to encourage contribution to
other presentation projects to use Accumulo as a backing store).

On Fri, Mar 16, 2018 at 6:34 PM Michael Wall <mjw...@apache.org> wrote:


I am in favor of removing the tracer ui from the monitor and the tracer
service that stores the spans in Accumulo.  I worry about doing so

with a

working alternative though.

On Fri, Mar 16, 2018 at 6:25 PM Mike Drob <md...@apache.org> wrote:


Do we have a migration story ready to go for folks that are used to

seeing

traces on the monitor?

On Fri, Mar 16, 2018 at 5:17 PM, Tony Kurc <trk...@gmail.com> wrote:


I like this idea.

On Fri, Mar 16, 2018 at 5:09 PM, Christopher <ctubb...@apache.org>

wrote:



Devs,

(This discussion is somewhat of a spinoff of our previous recent
conversation about HTrace, but I'd like to narrow the discussion to

one

specific topic regarding our tracer service.)

I'd like to remove Accumulo's tracer service and corresponding
presentations in the monitor for 2.0.

The tracer service currently acts as a sink for the traces from

Accumulo.


While there is interest in tracing Accumulo, and Accumulo may

itself

be

suitable (with the right schema) for storing traces, I do not think

acting

as a "trace sink" is really the kind of thing we should be doing as

part

of

Accumulo's out-of-the-box core functionality.

Also, the presentation and search capabilities of the traces found

in

the

trace table (by convention, and assumed by the monitor) is far from

an

ideal presentation of this data, and I don't think the Accumulo

project

should continue maintaining that inside the core project's monitor,

either.


I think we should encourage interested volunteers to contribute to

other

trace presentation software (wherever they may exist) any necessary
"backing store" implementation based on Accumulo.

None of this would remove tracing instrumentation from Accumulo...

it

would

just require users interested in trace data from Accumulo to

configure

an

appropriate sink to collect that data in some other integrated

component

of

their overall architecture.

Decoupling the integrated trace sink from the instrumentation in

Accumulo

like this could even be a step towards providing support for

multiple

different tracing libraries. (I guess we could do this now, but it

would

be

easier if we were not also trying to provide a sink implementation

for

one

specific version of one specific instrumentation library.)

Thoughts?

















Re: [DISCUSS] Remove tracer service (not instrumentation)

2018-03-16 Thread Josh Elser

I think I'm in agreement with this subset of Mikes.

I like the idea long-term. The tracing service is "add-on", and can live 
outside Accumulo.


I don't like the idea of moving the code out and taking away code which 
is functional today. I am +1 on the idea of building the same 
functionality outside of the core product. I am -1 on removing the 
functionality in the core product until the replacement is ready (e.g. 
clear docs for users covering how they get back to "normal").


On 3/16/18 6:49 PM, Michael Wall wrote:

Yeah, I get it.  That should have said "without a working example
alternative".  Something to make it as easy as possible for someone
currently using tracing to not loose functionality.

Thanks

On Fri, Mar 16, 2018, 18:38 Christopher  wrote:


The alternative is to configure any of the other HTrace sinks which are
available. The current code for Accumulo's tracer service could even be
forked and supported as a separate sink to optionally use (but as I said in
my original email, I think it'd be better to encourage contribution to
other presentation projects to use Accumulo as a backing store).

On Fri, Mar 16, 2018 at 6:34 PM Michael Wall  wrote:


I am in favor of removing the tracer ui from the monitor and the tracer
service that stores the spans in Accumulo.  I worry about doing so with a
working alternative though.

On Fri, Mar 16, 2018 at 6:25 PM Mike Drob  wrote:


Do we have a migration story ready to go for folks that are used to

seeing

traces on the monitor?

On Fri, Mar 16, 2018 at 5:17 PM, Tony Kurc  wrote:


I like this idea.

On Fri, Mar 16, 2018 at 5:09 PM, Christopher 

wrote:



Devs,

(This discussion is somewhat of a spinoff of our previous recent
conversation about HTrace, but I'd like to narrow the discussion to

one

specific topic regarding our tracer service.)

I'd like to remove Accumulo's tracer service and corresponding
presentations in the monitor for 2.0.

The tracer service currently acts as a sink for the traces from

Accumulo.


While there is interest in tracing Accumulo, and Accumulo may

itself

be

suitable (with the right schema) for storing traces, I do not think

acting

as a "trace sink" is really the kind of thing we should be doing as

part

of

Accumulo's out-of-the-box core functionality.

Also, the presentation and search capabilities of the traces found

in

the

trace table (by convention, and assumed by the monitor) is far from

an

ideal presentation of this data, and I don't think the Accumulo

project

should continue maintaining that inside the core project's monitor,

either.


I think we should encourage interested volunteers to contribute to

other

trace presentation software (wherever they may exist) any necessary
"backing store" implementation based on Accumulo.

None of this would remove tracing instrumentation from Accumulo...

it

would

just require users interested in trace data from Accumulo to

configure

an

appropriate sink to collect that data in some other integrated

component

of

their overall architecture.

Decoupling the integrated trace sink from the instrumentation in

Accumulo

like this could even be a step towards providing support for

multiple

different tracing libraries. (I guess we could do this now, but it

would

be

easier if we were not also trying to provide a sink implementation

for

one

specific version of one specific instrumentation library.)

Thoughts?













Re: [VOTE] Switch to GitHub issues

2018-03-13 Thread Josh Elser
+0 since there seems to be such a strong desire to use this that I just 
don't quite understand :)


Thanks to those who worked to clarify the ambiguity/issues that I was 
worried about previously.


On 3/13/18 12:53 PM, Keith Turner wrote:

Accumulo PMC,

Please vote on initiating the transition from JIRA to GitHub. The
purpose of this vote is to see if there is agreement on the following
three items the community discussed.

   * Using this workflow initially :
  https://github.com/apache/accumulo-website/pull/59
   * The ability to modify the workflow via lazy consensus through
discussions on the dev list.
   * The goal of only using one issue tracker after a transition period
in which two are used.

If the vote passes, I will ask Infra to enable Github Issues and then
merge the website PR.

This vote will be open through at least Fri Mar 16 16:45:00 UTC 2018
(Fri Mar 16 12:45:00 EDT 2018 / Fri Mar 16 09:45:00 PDT 2018)



Re: Accumulo stickers or t-shirts?

2018-03-12 Thread Josh Elser
IIRC, as long as you have PMC approval and you're not profiting off of 
the swag, it's ok from the ASF point of view. I can't find ASF 
trademarks documentation on it at the moment, however.


On 3/11/18 12:52 PM, Tony Kurc wrote:

Hi,
I was wondering if anyone had ever designed and printed either stickers or
t-shirts (or other cool stuff) for Apache Accumulo. I am likely to head to
a conference soon and would like to have some things for trading and/or
giveaways for the project. If there were some things that are already
designed that are already "blessed" by the project, that would be awesome.
If there aren't, do you all have a process for someone vetting a proposed
design?

Tony



Re: [DISCUSS] Switch to GitHub issues after trial

2018-03-07 Thread Josh Elser

Sorry for the top-post.

I really appreciate the numbered list below, Keith. Specifically the 
answers to #1 and #4 make me very happy.


I think #5 needs some a little more concrete (IMO, you should just 
decide what it should be).


#6 +1 to a message to private, this is how Apache general requests this 
be done).


While I can appreciate your stance on #3 and I think I would not call it 
a blocker either, this is probably something worth the 15-30 minutes of 
investigation. Sean/Mike may feel more strongly than I do. Learning from 
others, even if it just dropping an email to dev@spark directly to ask 
the question goes a long way..


On 3/7/18 10:55 AM, Keith Turner wrote:

On Mon, Mar 5, 2018 at 6:07 PM, Keith Turner <ke...@deenlo.com> wrote:

On Thu, Feb 15, 2018 at 12:52 PM, Josh Elser <els...@apache.org> wrote:

-0 as an initial reaction because I'm still not convinced that GH issues
provides any additional features or better experience than JIRA does, and
this change would only serve to fragment an already bare community.

My concerns that would push that -0 to a -1 include (but aren't limited to):

* Documentation/website update for the release process
* Validation that our release processes on JIRA has similar functionality on
GH issues
* Updated contributor docs (removing JIRA related content, add an
explanation as to the change)
* CONTRIBUTING.md updated on relevant repos


I opened the following PR with a proposal for how we could start using github.

https://github.com/apache/accumulo-website/pull/59


There were lots of valid concerns raised during this discussion.  The
concerns shaped the proposal I submitted. Rather than reply to them
individually in different emails I am collecting them all here and
sharing my thoughts about them.


1. How do we release?

   JIRA is used in three important ways for releases : setting blockers,
   triaging issues, and generating release notes.  I think the proposal
   addresses all three.

2. Will we document contributor guidelines to avoid confusion?

   What is expected of contributors is clearly documented.

3. Can someone investigate how Spark operates before switching?

   That would be great if someone volunteered to do this and wrote up their
   findings.  However if no one volunteers, then I do not think this should
   be a blocker.  There are many other projects that would be worthy of
   investigation also.

4. What is the migration plan for existing issues?  Will we have split issue
tracker for years?

   The proposal documents migrating existing JIRA issues as they are worked.
   This means that existing JIRA issues that are never worked will never
   migrate. After all branches are released, JIRA can be put in read only mode
   (only PMC can change it).  It will be left active for reference and
   migration of existing issues.

5. How will we handle fix versions?

The proposal suggest using issue labels in github for this.  Also suggest
using a prefix on fix version labels to make them sort last.

6. How will we handle security issues?

We need to clearly document on our website how users should report
security issues.  I am not sure this is done at the moment.  Since this
is infrequent I think we can handle this on the private list.  I think
our workflow should be optimized for frequent actions and not infrequent
ones.

7. Should we switch all repos to GH issues except Accumulo core?

I think this a good example of how design by committee can go
wrong.  This is a really confusing solution that does not
improve our workflow, so the benefits are not clear to me.





- Josh


On 2/15/18 12:05 PM, Mike Walch wrote:


I would like to open discussion on moving from Jira to GitHub issues.
GitHub issues would be enabled for a trial period. After this trial
period,
the project would either move completely to GitHub issues or keep using
Jira. Two issue trackers would not be used after trial period.





Re: [DISCUSS] status of Hadoop 3 for 1.9.0 release

2018-03-01 Thread Josh Elser
Yeah, if Hadoop has changed their stance, propagating a "use as your own 
risk" would be sufficient from our end.


On 3/1/18 6:06 PM, Christopher wrote:

If there's a risk, I'd suggest calling things out as "experimental" in the
release notes, and encourage users to try it and give us feedback.

On Thu, Mar 1, 2018 at 5:10 PM Sean Busbey  wrote:


hi folks!

While reviewing things in prep for getting our master branch over to
apache hadoop 3 only (see related discussion [1]), I noticed some wording
on the last RC[2] for Hadoop 3.0.1:


Please note:
* HDFS-12990. Change default NameNode RPC port back to 8020. It makes
incompatible changes to Hadoop 3.0.0. After 3.0.1 releases, Apache
Hadoop 3.0.0 will be deprecated due to this change.


Hadoop 3.0.0 was a production-ready release; the community did an extended
set of alpha/beta releases to shake out the kinds of things that would have
required labeling the X.Y.0 release as non-production in previous Hadoop 2
release lines. Deprecating it is a pretty strong signal, but from the
extended discussion[3] it seems to me that this isn't meant indicate that
the entire 3.0 release line will stop.

What do folks think?

- No problem from our perspective?
- Worth waiting to ship a Hadoop 3 ready release until Hadoop 3.0.1 comes
out?
- Worth waiting to ship a Hadoop 3 ready release until Hadoop 3.1.0 comes
out?
- Leave things as-is and give a word of warning for would-be early
adopters in our release notes?
- Expressly call things out in our release notes as "experimental" and we
might make changes once later Hadoop 3s come out?

[1]: https://s.apache.org/pOKv
[2]: https://s.apache.org/brE4
[3]: https://s.apache.org/BWd6





Re: [DISCUSS] Switch to GitHub issues after trial

2018-03-01 Thread Josh Elser
After the rest of the discussion, I feel like I need to be explicit (so, 
I'm sorry if I'm being pedantic and we're already in agreement here):


You're planning to document how GitHub tech would be used to make 
releases on these repositories? And, we're in agreement that JIRA would 
not be used at all for these repositories?


In short, +0 as long as the process for releasing software is clear, I 
don't have issues with the process using different tooling than we 
presently use (although, still don't see the benefit to changing).


On 3/1/18 2:41 PM, Mike Walch wrote:

I would like to start up this discussion again. I don't think we have
reached consensus on moving the primary Accumulo repo to GitHub issues. The
primary repo has common workflows (i.e creating issues that affect multiple
versions) that don't easily transition to GitHub issues. I have heard
several solutions but no consensus.

As for moving our secondary repos (listed below), this seems much easier
and I haven't heard any concerns so far. Does anyone have concerns about
moving these repos?

https://github.com/apache/accumulo-docker
https://github.com/apache/accumulo-examples
https://github.com/apache/accumulo-testing
https://github.com/apache/accumulo-website
https://github.com/apache/accumulo-wikisearch


On Fri, Feb 16, 2018 at 10:54 AM, Sean Busbey  wrote:


On Fri, Feb 16, 2018 at 9:27 AM, Mike Walch  wrote:





Some of the concerns brought up would be answerable with a trial. How

do

we

do a release? What does aggregating issues fixed in a particular

version

look like?



You can tag GH issues with a version but I think it's best to just go
through commit history
to compile the release notes. This should already be done as there is no
guarantee
even with Jira that all issues were labeled correctly. If you are using
GitHub issues, all issue
numbers in commits link back to the issue or pull request which we don't
have with Jira right
now.



This gets to an issue I have. What's our source of truth about "X is fixed
in Y" during the trial? I have been assuming that JIRA is currently our
source of truth, but maybe that's wrong. Is it the release notes?

IMHO, Git is a poor choice for the source of truth due to the immutability
of commit messages, at least in ASF contexts since we can't do force pushes
(in at least some branches).


--
busbey





Re: [DISCUSS] tracing framework updates

2018-02-28 Thread Josh Elser
Thanks for letting us know, Tony. I can totally understand how the 
server-side tracing (and collapsing it can do) would be super-helpful in 
figuring out what's happening.


I read that as one reason for simply not trying to get HDFS and Accumulo 
re-sync'ed. I think we have value in leaving what we presently have in 
Accumulo now over removing it completely.


On 2/27/18 8:50 PM, Tony Kurc wrote:

Josh,
It was exclusively the first - using the traces in the server-side code.
The most common case is "I have a scan which is much slower than expected",
and couldn't figure out why. I'm trying to think of alternative approaches
to using the traces, and honestly, doing a bunch of log aggregation is the
alternative I'd have to fall back to, and in some cases recompiling parts
of accumulo with new log messages in place.


Tony

On Tue, Feb 27, 2018 at 7:18 PM, Josh Elser <els...@apache.org> wrote:


Oh, that's a pleasant surprise to hear, actually.

Anything you can share with the class, Tony? Would love to hear (even if
brief) how it was used and benefited you.

Specifically, I'm curious if...

* You looked at traces from our server-side instrumented code
* You instrumented your own code outside of Accumulo and used Accumulo as
the backing store
* You instrumented code inside/outside Accumulo and benefited from the
server-side instrumentation (e.g. your code's spans collapsing with the
server's spans)


On 2/27/18 6:52 PM, Tony Kurc wrote:


I'd personally be disappointed to see it removed. There is a bit of a
learning curve and startup cost to use it now, but when diagnosing major
challenges, it has been an invaluable capability.

On Feb 27, 2018 3:15 PM, "Josh Elser" <els...@apache.org> wrote:

Wow... that's, erm, quite the paper. Nothing like taking some pot-shots at
another software project and quoting folks out of context.

Does it help to break down the problem some more?

* Is Accumulo getting benefit from tracing its library?
* Is Accumulo getting benefit from tracing context including HDFS calls?

I feel like it is a nice tool to have in your toolbelt (having used it
successfully in the past), but I wonder if it's the most effective thing
to
keep inside of Accumulo. Specifically, would it be better to just pull
this
out of Accumulo outright?

I don't think I have an opinion yet.


On 2/27/18 1:08 PM, Ed Coleman wrote:

For general discussion - Facebook recently (Oct 28, 2017) published a

paper on tracing: Canopy: An End-to-End Performance Tracing and Analysis
System (https://research.fb.com/publications/canopy-end-to-end-
performance-tracing-at-scale/)

As a bonus, they referenced Accumulo and HTrace in section 2.2

"Mismatched models affected compatibility between mixed system versions;
e.g. Accumulo and Hadoop were impacted by the “continued lack of concern
in
the HTrace project around tracing during upgrades”


-Original Message-
From: Tony Kurc [mailto:tk...@apache.org]
Sent: Tuesday, February 27, 2018 12:57 PM
To: dev@accumulo.apache.org
Subject: Re: [DISCUSS] tracing framework updates

I have some experience with opentracing, and it definitely seems
promising, however, potentially promising in the same way htrace was...
That being said, I did a cursory thought exercise of what it would take
to
do a swap of the current tracing in accumulo to opentracing, and I didn't
come across any hard problems, meaning it could be a fairly
straightforward
refactor. I was hoping to explore the community a bit more at some
upcoming
conferences

On Feb 27, 2018 11:59 AM, "Sean Busbey" <bus...@apache.org> wrote:




On 2018/02/27 16:39:02, Christopher <ctubb...@apache.org> wrote:

I didn't realize HTrace was struggling in incubation. Maybe some of

us

can


start participating? The project did start within Accumulo, after all.


What


does it need? I also wouldn't want to go back to maintaining cloudtrace.



I suspect it's too late for HTrace. The last commit to the main

development branch was May 2017. They had a decent run of activity in
2015 and an almost-resurgence in 2016, but they never really got
enough community traction to survive the normal ebb and flow of
contributor involvement.

They need the things any project needs to be sustainable: regular
release cadences, a responsive contribution process, and folks to do
the long slog of building interest via e.g. production adoption.

I'm unfamiliar with OpenTracing, but it was my understanding that


Zipkin was more of a tracing sink, than an instrumentation API.
HTrace is

actually


listed as an instrumentation library for Zipkin (among others).



I think the key is that for a instrumentation library to get adoption

it needs a good sink that provides utility to operators looking to
diagnose problems. It took too long for HTrace to provide any tooling
that could help with even simple performance profiling. Maybe hooking
it into Zipkin would get around that. Personally, I never ma

Re: [DISCUSS] tracing framework updates

2018-02-27 Thread Josh Elser

Oh, that's a pleasant surprise to hear, actually.

Anything you can share with the class, Tony? Would love to hear (even if 
brief) how it was used and benefited you.


Specifically, I'm curious if...

* You looked at traces from our server-side instrumented code
* You instrumented your own code outside of Accumulo and used Accumulo 
as the backing store
* You instrumented code inside/outside Accumulo and benefited from the 
server-side instrumentation (e.g. your code's spans collapsing with the 
server's spans)


On 2/27/18 6:52 PM, Tony Kurc wrote:

I'd personally be disappointed to see it removed. There is a bit of a
learning curve and startup cost to use it now, but when diagnosing major
challenges, it has been an invaluable capability.

On Feb 27, 2018 3:15 PM, "Josh Elser" <els...@apache.org> wrote:

Wow... that's, erm, quite the paper. Nothing like taking some pot-shots at
another software project and quoting folks out of context.

Does it help to break down the problem some more?

* Is Accumulo getting benefit from tracing its library?
* Is Accumulo getting benefit from tracing context including HDFS calls?

I feel like it is a nice tool to have in your toolbelt (having used it
successfully in the past), but I wonder if it's the most effective thing to
keep inside of Accumulo. Specifically, would it be better to just pull this
out of Accumulo outright?

I don't think I have an opinion yet.


On 2/27/18 1:08 PM, Ed Coleman wrote:


For general discussion - Facebook recently (Oct 28, 2017) published a
paper on tracing: Canopy: An End-to-End Performance Tracing and Analysis
System (https://research.fb.com/publications/canopy-end-to-end-
performance-tracing-at-scale/)

As a bonus, they referenced Accumulo and HTrace in section 2.2

"Mismatched models affected compatibility between mixed system versions;
e.g. Accumulo and Hadoop were impacted by the “continued lack of concern in
the HTrace project around tracing during upgrades”


-Original Message-
From: Tony Kurc [mailto:tk...@apache.org]
Sent: Tuesday, February 27, 2018 12:57 PM
To: dev@accumulo.apache.org
Subject: Re: [DISCUSS] tracing framework updates

I have some experience with opentracing, and it definitely seems
promising, however, potentially promising in the same way htrace was...
That being said, I did a cursory thought exercise of what it would take to
do a swap of the current tracing in accumulo to opentracing, and I didn't
come across any hard problems, meaning it could be a fairly straightforward
refactor. I was hoping to explore the community a bit more at some upcoming
conferences

On Feb 27, 2018 11:59 AM, "Sean Busbey" <bus...@apache.org> wrote:




On 2018/02/27 16:39:02, Christopher <ctubb...@apache.org> wrote:


I didn't realize HTrace was struggling in incubation. Maybe some of
us


can


start participating? The project did start within Accumulo, after all.


What


does it need? I also wouldn't want to go back to maintaining cloudtrace.



I suspect it's too late for HTrace. The last commit to the main
development branch was May 2017. They had a decent run of activity in
2015 and an almost-resurgence in 2016, but they never really got
enough community traction to survive the normal ebb and flow of
contributor involvement.

They need the things any project needs to be sustainable: regular
release cadences, a responsive contribution process, and folks to do
the long slog of building interest via e.g. production adoption.

I'm unfamiliar with OpenTracing, but it was my understanding that

Zipkin was more of a tracing sink, than an instrumentation API.
HTrace is


actually


listed as an instrumentation library for Zipkin (among others).



I think the key is that for a instrumentation library to get adoption
it needs a good sink that provides utility to operators looking to
diagnose problems. It took too long for HTrace to provide any tooling
that could help with even simple performance profiling. Maybe hooking
it into Zipkin would get around that. Personally, I never managed to
get the two to actually work together.

My listing Zipkin as an option merely reflects my prioritization of
practical impact of whatever we go to. I don't want to adopt some
blue-sky effort. FWIW, OpenTracing docs at least claim to also provide
a zipkin-sink compatible runtime.

There's a whole community that just does distributed monitoring, maybe
someone has time to survey some spaces and see if OpenTracing has any
legs.








Re: [DISCUSS] dropping hadoop 2 support

2018-02-27 Thread Josh Elser

+1

AFAIK, this wouldn't have to be anything more than build changes. 
"Dropping hadoop2 support" wouldn't need to include any other changes 
(as adding H3 support didn't require any Java changes). Getting in front 
of the ball to help push people towards newer versions would be a 
welcome change.


On 2/27/18 10:42 AM, Sean Busbey wrote:

Let's get the discussion started early on when we'll drop hadoop 2 support.

As of ACCUMULO-4826 we are poised to have Hadoop 2 and Hadoop 3 supported in 
1.y releases as of 1.9.0. That gives an upgrade path so that folks won't have 
to upgrade both Hadoop and Accumulo at the same time.

How about Accumulo 2.0.0 requires Hadoop 3?

If there's a compelling reason for our users to stay on Hadoop 2.y releases, we 
can keep making Accumulo 1.y releases. Due to the shift away from maintenance 
releases in Hadoop we'll need to get more aggressive in adopting minor releases.



Re: [DISCUSS] Proposed formatter change: 100 char lines

2018-02-16 Thread Josh Elser

+1 to not changing min-Java on the release lines that supported Java 7.

Let's just cease activity on these branches instead :)

On 2/16/18 9:55 AM, Sean Busbey wrote:

I'm opposed to requiring Java 8 to build on branches that we claim support
running under Java 7. Historically relying on "compile for earlier target
JDK" has just led to pain down the road when it inevitably doesn't work.

Just make it a recommendation for contributions and have our precommit
checks do the build with Java 8 to verify the formatting has already
happened.

On Thu, Feb 15, 2018 at 10:24 PM, Christopher  wrote:


Primarily for accessibility reasons (screen space with a comfortable font),
but also to support readability for devs working on sensibly-sized screens,
I want to change our formatter to format with 100 char line length instead
of its current 160.

Many of our files need to be reformatted anyway, because the current
formatter is configured incorrectly for Java 8 lambda syntax and needs to
be fixed, so this might be a good opportunity to make the switch.

Also, at this point I think it is sensible to require Java 8 to build
Accumulo... even when building older branches. (Accumulo 1.x will still
support running on Java 7, of course, but Java 8 would be required to build
it). The reason for this requirement is that in order to reduce merge
conflicts and merge bugs between branches, I'd like to update the
formatting across all branches, but the formatter which supports this
syntax requires Java 8 to run. The alternative to requiring Java 8 would be
to only run the formatter when building with Java 8... and skip formatting
if building with Java 7, which might result in some unformatted
contributions, depending on the JRE version used to build.







Re: [DISCUSS] Release 1.7.4 and the 1.9.0

2018-02-16 Thread Josh Elser

SGTM

On 2/15/18 11:10 PM, Ed Coleman wrote:

I'd like to propose that we start the release process for 1.7.4 and then
1.9.0. I'm willing to be the release manager for both if that would
facilitate things.

  


As a strawman - I propose:

  


March 1st - we start the formal release process of 1.7.4, with a goal that
it would be complete and released around March 15th. This would be the last
planned release of the 1.7.x line.

March 19th we start the formal release process of 1.9.0.

  


My real objective is to get a release of 1.9.0 that would be mostly
equivalent to what would have been an 1.8.2, with the API changes for
configuration to support Hadop-3. There seems to be some fixes in 1.8.1 that
I'd like to see released and Keith Turner seems to be making some
substantial fixes to performance issues that I'd hope to be able to take
advantage of - however, I would like to have a bound to help limit upgrade
risks.

  


The dates are just a starting point for discussion - if Keith has additional
fixes that we'd like to get in, but needs additional time that's fine with
me, I'm really just pushing for sooner rather than later.

  


Ed Coleman

  





Re: [DISCUSS] Switch to GitHub issues after trial

2018-02-15 Thread Josh Elser

On 2/15/18 6:18 PM, Christopher wrote:

On Thu, Feb 15, 2018 at 5:08 PM Josh Elser <els...@apache.org> wrote:


On 2/15/18 4:56 PM, Christopher wrote:

On Thu, Feb 15, 2018 at 4:55 PM Josh Elser <els...@apache.org> wrote:


On 2/15/18 4:17 PM, Mike Drob wrote:

What do we do if the trial is wildly successful? Is there a migration

plan

for our currently open issues? We have almost 1000 of them.



As Keith said in the other thread, we don't need to have all the

answers up

front.


You're right, we don't need to have all of the answers up front.
This is one that I'd like to have some thought put into though.

There's lots of things that are fine to handle as we approach it, but

this

one seems like it will lead to us having split issue trackers

for_years_

down the road.



This is a good point I hadn't yet considered.

There's not only the migration question that eventually needs to be
answered, but an immediate question of how will we determine when we can
release a version of Accumulo? Are there conventions/features on the GH
issues side that will provide some logical analog to the fixVersion of
JIRA?



These are all great questions... that could be answered with a trial...



Shall I assume then that you are volunteering to handle all issue
management across the disparate systems for all releases?

A trial is a good idea to determine _if we like the system_ and want to
migrate to it. It's not a substitute for determining if the system is
_viable_.



I'm of a different opinion: I already know I like GitHub issues and want to
migrate to it. What I don't know is if it is viable for Accumulo's needs.



Glad you like GH issues, but that isn't not what is being discussed 
here. The matter at hand is figuring out the logistics of *how* do we 
move to a different issue tracker in a manner that doesn't derail the 
project management of a fully-distributed team.


I'm worried because I feel like there are valid concerns being brought 
up here without acknowledgement of the impact of those who only 
participate with Accumulo digitally.


Re: [DISCUSS] Switch to GitHub issues after trial

2018-02-15 Thread Josh Elser

On 2/15/18 4:56 PM, Christopher wrote:

On Thu, Feb 15, 2018 at 4:55 PM Josh Elser <els...@apache.org> wrote:


On 2/15/18 4:17 PM, Mike Drob wrote:

What do we do if the trial is wildly successful? Is there a migration

plan

for our currently open issues? We have almost 1000 of them.



As Keith said in the other thread, we don't need to have all the

answers up

front.


You're right, we don't need to have all of the answers up front.
This is one that I'd like to have some thought put into though.

There's lots of things that are fine to handle as we approach it, but

this

one seems like it will lead to us having split issue trackers for_years_
down the road.



This is a good point I hadn't yet considered.

There's not only the migration question that eventually needs to be
answered, but an immediate question of how will we determine when we can
release a version of Accumulo? Are there conventions/features on the GH
issues side that will provide some logical analog to the fixVersion of
JIRA?



These are all great questions... that could be answered with a trial...



Shall I assume then that you are volunteering to handle all issue 
management across the disparate systems for all releases?


A trial is a good idea to determine _if we like the system_ and want to 
migrate to it. It's not a substitute for determining if the system is 
_viable_.


Re: [DISCUSS] Switch to GitHub issues after trial

2018-02-15 Thread Josh Elser

On 2/15/18 4:17 PM, Mike Drob wrote:

What do we do if the trial is wildly successful? Is there a migration

plan

for our currently open issues? We have almost 1000 of them.



As Keith said in the other thread, we don't need to have all the answers up
front.


You're right, we don't need to have all of the answers up front.
This is one that I'd like to have some thought put into though.

There's lots of things that are fine to handle as we approach it, but this
one seems like it will lead to us having split issue trackers for_years_
down the road.



This is a good point I hadn't yet considered.

There's not only the migration question that eventually needs to be 
answered, but an immediate question of how will we determine when we can 
release a version of Accumulo? Are there conventions/features on the GH 
issues side that will provide some logical analog to the fixVersion of JIRA?


Re: [DISCUSS] Switch to GitHub issues after trial

2018-02-15 Thread Josh Elser
-0 as an initial reaction because I'm still not convinced that GH issues 
provides any additional features or better experience than JIRA does, 
and this change would only serve to fragment an already bare community.


My concerns that would push that -0 to a -1 include (but aren't limited to):

* Documentation/website update for the release process
* Validation that our release processes on JIRA has similar 
functionality on GH issues
* Updated contributor docs (removing JIRA related content, add an 
explanation as to the change)

* CONTRIBUTING.md updated on relevant repos

- Josh

On 2/15/18 12:05 PM, Mike Walch wrote:

I would like to open discussion on moving from Jira to GitHub issues.
GitHub issues would be enabled for a trial period. After this trial period,
the project would either move completely to GitHub issues or keep using
Jira. Two issue trackers would not be used after trial period.



Re: Additional options for issue tracking

2018-02-15 Thread Josh Elser

On 2/15/18 12:28 PM, Christopher wrote:

Want to spin out a DISCUSS on the desire to switch, Mike Walch? That
seems to me like it should be the next step.


I thought that's what we were doing.:)


This isn't tagged with DISCUSS in the subject (which I know some 
subscribers of our list filter on) and this thread is convoluted 
already. The intent of this discussion isn't cut-dry like it could be.


Re: Additional options for issue tracking

2018-02-15 Thread Josh Elser


On 2/15/18 11:26 AM, Keith Turner wrote:

On Thu, Feb 15, 2018 at 11:01 AM, Josh Elser <els...@apache.org> wrote:

We tell users that try to file issues on the "unsupported" issue tracker
that they've created the issue in the wrong place and point them to the
right issue tracker.


Personally I think that is ok for a short period.  Its like driving on
a road during construction, you know annoyance is unavoidable.
However no one wants to drive on a road that under construction
indefinitely.  So if we start on this I would like consensus that we
plan to  transition from Jira to Github in a timely manner.   I don't
think we should try to figure everything out before we start though.
I think it would be good to have a simple starting plan and we hill
climb from there in search of a more optimal way of operating.

I don't like the idea of enabling github issue with no consensus  that
the goal is to transition away from Jira.  Leaving things in that
state for a long period seems bad to me.

If we start with the consensus to transition, its possible we may
decide not to and that ok.  I don't think any action needs to be taken
now for that eventuality. We can figure that out as we go during the
transition period.


+1 on all of this.

Want to spin out a DISCUSS on the desire to switch, Mike Walch? That 
seems to me like it should be the next step.


Re: Additional options for issue tracking

2018-02-15 Thread Josh Elser
We tell users that try to file issues on the "unsupported" issue tracker 
that they've created the issue in the wrong place and point them to the 
right issue tracker.


On 2/14/18 10:29 PM, Christopher wrote:

Can you elaborate on what kind of human controls you mean? What if a user
finds the GH issues and creates an issue there? What action should the
developers take?

On Wed, Feb 14, 2018 at 10:27 PM Josh Elser <els...@apache.org> wrote:


I didn't ask for automated controls here -- human controls are fine.

I have already said I am -1 on two concurrent issue trackers. If
developers want to evaluate them, that's fine.

On 2/14/18 10:12 PM, Christopher wrote:

I don't think we have the ability to lock out non-committers from

creating

new GH issues if we enable them, nor do I think it would make sense to do
so, since that's a valuable use case to consider during any trial period
before shutting off JIRA.

As for switching immediately to GH issues for non-primary repo (-website,
-examples, -docker, etc...) I think that makes sense since those are
already confusing when filed in the JIRA mixed in with the main repo's
issues.

On Wed, Feb 14, 2018 at 9:25 PM Josh Elser <els...@apache.org> wrote:


I am OK with committers ONLY using GH issues on all repos (with clear
guidance as to what the heck the project is doing) or doing a
full-switch on the other repos.

On 2/14/18 7:00 PM, Mike Miller wrote:

We could do a trial period of GitHub issues for the accumulo sub-repos
(accumulo-website, accumulo-examples...) then after a month or two

decide

to switch or not.  That way we won't have duplicate issues or the

confusion

of having 2 trackers for one repository.

On Wed, Feb 14, 2018 at 6:12 PM, Mike Walch <mwa...@apache.org> wrote:


+1 I think it makes sense to try out GitHub before shutting off JIRA.

This

period could be limited to a month or two.

On Wed, Feb 14, 2018 at 5:59 PM, Christopher <ctubb...@apache.org>

wrote:



What if we had an interim transition period, tentatively using GitHub

to

determine it's suitability for our workflows, and shut off JIRA

later?


On Wed, Feb 14, 2018 at 5:51 PM Josh Elser <els...@apache.org>

wrote:



I disagree with Mike in that I don't find JIRA to be so painful that

it

necessitates us changing, but I wouldn't block a move to GH issues

if

we

turn off our JIRA use.

On 2/14/18 4:29 PM, Mike Drob wrote:

@josh - How do you feel about move from JIRA to GH Issues

completely?


On Wed, Feb 14, 2018 at 3:26 PM, Josh Elser <els...@apache.org>

wrote:



I believe I already stated -1 the last time this was brought up.

Using two issue trackers is silly.


On 2/14/18 3:30 PM, Mike Walch wrote:


I want to enable GitHub issues for Accumulo's repos. This is not

to

replace
JIRA but to give contributors more options for issue tracking.

Unless

there
are objections, I will create an infra ticket this week.






















Re: Additional options for issue tracking

2018-02-14 Thread Josh Elser

I didn't ask for automated controls here -- human controls are fine.

I have already said I am -1 on two concurrent issue trackers. If 
developers want to evaluate them, that's fine.


On 2/14/18 10:12 PM, Christopher wrote:

I don't think we have the ability to lock out non-committers from creating
new GH issues if we enable them, nor do I think it would make sense to do
so, since that's a valuable use case to consider during any trial period
before shutting off JIRA.

As for switching immediately to GH issues for non-primary repo (-website,
-examples, -docker, etc...) I think that makes sense since those are
already confusing when filed in the JIRA mixed in with the main repo's
issues.

On Wed, Feb 14, 2018 at 9:25 PM Josh Elser <els...@apache.org> wrote:


I am OK with committers ONLY using GH issues on all repos (with clear
guidance as to what the heck the project is doing) or doing a
full-switch on the other repos.

On 2/14/18 7:00 PM, Mike Miller wrote:

We could do a trial period of GitHub issues for the accumulo sub-repos
(accumulo-website, accumulo-examples...) then after a month or two decide
to switch or not.  That way we won't have duplicate issues or the

confusion

of having 2 trackers for one repository.

On Wed, Feb 14, 2018 at 6:12 PM, Mike Walch <mwa...@apache.org> wrote:


+1 I think it makes sense to try out GitHub before shutting off JIRA.

This

period could be limited to a month or two.

On Wed, Feb 14, 2018 at 5:59 PM, Christopher <ctubb...@apache.org>

wrote:



What if we had an interim transition period, tentatively using GitHub

to

determine it's suitability for our workflows, and shut off JIRA later?

On Wed, Feb 14, 2018 at 5:51 PM Josh Elser <els...@apache.org> wrote:


I disagree with Mike in that I don't find JIRA to be so painful that

it

necessitates us changing, but I wouldn't block a move to GH issues if

we

turn off our JIRA use.

On 2/14/18 4:29 PM, Mike Drob wrote:

@josh - How do you feel about move from JIRA to GH Issues completely?

On Wed, Feb 14, 2018 at 3:26 PM, Josh Elser <els...@apache.org>

wrote:



I believe I already stated -1 the last time this was brought up.

Using two issue trackers is silly.


On 2/14/18 3:30 PM, Mike Walch wrote:


I want to enable GitHub issues for Accumulo's repos. This is not to
replace
JIRA but to give contributors more options for issue tracking.

Unless

there
are objections, I will create an infra ticket this week.


















Re: Additional options for issue tracking

2018-02-14 Thread Josh Elser
I am OK with committers ONLY using GH issues on all repos (with clear 
guidance as to what the heck the project is doing) or doing a 
full-switch on the other repos.


On 2/14/18 7:00 PM, Mike Miller wrote:

We could do a trial period of GitHub issues for the accumulo sub-repos
(accumulo-website, accumulo-examples...) then after a month or two decide
to switch or not.  That way we won't have duplicate issues or the confusion
of having 2 trackers for one repository.

On Wed, Feb 14, 2018 at 6:12 PM, Mike Walch <mwa...@apache.org> wrote:


+1 I think it makes sense to try out GitHub before shutting off JIRA. This
period could be limited to a month or two.

On Wed, Feb 14, 2018 at 5:59 PM, Christopher <ctubb...@apache.org> wrote:


What if we had an interim transition period, tentatively using GitHub to
determine it's suitability for our workflows, and shut off JIRA later?

On Wed, Feb 14, 2018 at 5:51 PM Josh Elser <els...@apache.org> wrote:


I disagree with Mike in that I don't find JIRA to be so painful that it
necessitates us changing, but I wouldn't block a move to GH issues if

we

turn off our JIRA use.

On 2/14/18 4:29 PM, Mike Drob wrote:

@josh - How do you feel about move from JIRA to GH Issues completely?

On Wed, Feb 14, 2018 at 3:26 PM, Josh Elser <els...@apache.org>

wrote:



I believe I already stated -1 the last time this was brought up.

Using two issue trackers is silly.


On 2/14/18 3:30 PM, Mike Walch wrote:


I want to enable GitHub issues for Accumulo's repos. This is not to
replace
JIRA but to give contributors more options for issue tracking.

Unless

there
are objections, I will create an infra ticket this week.














Re: Additional options for issue tracking

2018-02-14 Thread Josh Elser
I disagree with Mike in that I don't find JIRA to be so painful that it 
necessitates us changing, but I wouldn't block a move to GH issues if we 
turn off our JIRA use.


On 2/14/18 4:29 PM, Mike Drob wrote:

@josh - How do you feel about move from JIRA to GH Issues completely?

On Wed, Feb 14, 2018 at 3:26 PM, Josh Elser <els...@apache.org> wrote:


I believe I already stated -1 the last time this was brought up.

Using two issue trackers is silly.


On 2/14/18 3:30 PM, Mike Walch wrote:


I want to enable GitHub issues for Accumulo's repos. This is not to
replace
JIRA but to give contributors more options for issue tracking. Unless
there
are objections, I will create an infra ticket this week.






Re: Additional options for issue tracking

2018-02-14 Thread Josh Elser

I believe I already stated -1 the last time this was brought up.

Using two issue trackers is silly.

On 2/14/18 3:30 PM, Mike Walch wrote:

I want to enable GitHub issues for Accumulo's repos. This is not to replace
JIRA but to give contributors more options for issue tracking. Unless there
are objections, I will create an infra ticket this week.



Re: [DISCUSS] Any interest in separate client/server tarballs

2018-01-05 Thread Josh Elser
I think it would depend how much other "stuff" has to come in to support 
the *Clusters. I assumed it would be a bit, but, if it's not, I have no 
objections to a single jar.


On 1/5/18 4:38 PM, Michael Wall wrote:

Yeah, I was thinking more like your second paragraph.  Thinking I would use
the proposed client jar to develop against the MiniAccumuloCluster
(typically the StandaloneMiniAccumuloCluster for me) and then deploy that
code to run against a real cluster.  Would like to flesh that usecase out a
little more.  Do you think it has to be another jar on top of the client
jar?

On Fri, Jan 5, 2018 at 4:31 PM Josh Elser <josh.el...@gmail.com> wrote:


MAC, in its common state, is probably not something we'd want to include
in this proposed tarball. The reasoning being that MAC (and related
classes) aren't something that people would need on your "Hadoop
Cluster" to talk to Accumulo. It's something that can just be obtained
via Maven.

However, if you're more referring to MAC as the generic
"AccumuloCluster" interface (an attempt to make running tests against
MAC and a real Accumulo cluster transparent --
StandaloneAccumuloCluster), then I could see some JAR that we'd include
which would contain the necessary classes (on top of
accumulo-client.jar) for users to run code seamlessly against a
traditional MAC or the StandaloneAccumuloCluster.

On 1/5/18 4:22 PM, Michael Wall wrote:

I like the idea of a client jar that has less dependencies.  Josh, where
are thinking the MiniAccumuloCluster fits in here?

On Fri, Jan 5, 2018 at 3:57 PM Christopher <ctubb...@apache.org> wrote:


On Fri, Jan 5, 2018 at 10:30 AM Keith Turner <ke...@deenlo.com> wrote:


On Thu, Jan 4, 2018 at 7:43 PM, Christopher <ctubb...@apache.org>

wrote:

tl;dr : I would prefer not to add another tarball as part of our

"official"

I am not opposed to replacing the current single tarball with client
and server tarballs.   What I find appealing about this is if the
client tarball has less deps.

However I think a lot of thought should be put into the scripts if
this is done.  For example the client tar and server tar should
probably not both have accumulo commands that do different things.



Agreed on Keith's point about the scripts and it requiring some
consideration.



releases, but I'd be in favor of a blog instructions, script, or build
profile, which users could read/execute/activate to create a

client-centric

package.

I've long believed that supporting different downstream packaging

scenarios

should be prioritized over upstream binary packaging. I have argued in


These "downstream" packaging could be done within the Apache Accumulo
project also.  Like accumulo-docker.  Creating other packaging
projects within Accumulo is something to consider.



+1; When I say "downstream", it's a role, not an entity. The point is

that

it's a distinct activity. accumulo-docker is a perfect example of a
"downstream packaging" project maintained by the upstream community. I

find

it frustrating sometimes when supporting users that they can't tell the
difference between what is "Accumulo" and what is "this specific
packaging/configuration/deployment of Accumulo", because we don't make
those lines clear. I think we can draw these lines a bit more clearly.



favor of removing our current tarball entirely, while supporting

efforts

to

Apache Accumulo needs some sort of tarball that makes it easy to run
the code on a cluster, otherwise how can we test Accumulo on a cluster
for releases?



A binary tarball may be the best for this, but it's little more than the
jars in Maven Central and a few text files. It could be trivially

replaced

with a simple script and manifest; it could also be replaced with an

RPM, a

docker image, or any number of things. A tarball is just one type of
packaging for Accumulo's binaries.

In any case, I wasn't talking about removing the ability to produce a
binary tarball from source. Only removing it from our release artifacts

and

downloads. It is not a popular opinion, but I still think it's

reasonable,

with both pros and cons.



enable downstream packaging by modularizing the server code,

supporting a

client-API jar (future work), and decoupling code from launch scripts.

I

think we should continue to do these kinds of improvements to support
different packaging scenarios downstream, but I'd prefer to avoid
additional "official" binary releases.


I agree, I think if the Accumulo Java code made less assumptions about
its runtime env it would result in code that is easier to maintain and
package for different environments.

In Fluo we have recently done a lot of work in order to support
Docker, Mesos, and Kubernetes.  This work has really cleaned up the
core Fluo code making it easier to run in any environment.

I suspect pulling the Accumuo tar ball into a separate

Re: [DISCUSS] Any interest in separate client/server tarballs

2018-01-05 Thread Josh Elser
s.  So it makes sense to discuss them at this
point, but I don't think they should block work on two tarballs if
that seems like a good idea.



Agreed. That discussion can be deferred. Much depends on how it is to be
split up.




Rather than provide additional packages, I'd prefer to work with

downstream

to make the source more "packagable" to suit the needs of these

downstream

vendor/community packagers. One way we can do that here is by either
documenting what would be needed in a client-centric package, or by
providing a script or build profile to create it from source, so that

your

$dayjob or any other downstream packager doesn't have to figure that

out

from scratch.

On Thu, Jan 4, 2018 at 7:17 PM Josh Elser <josh.el...@gmail.com>

wrote:



Hi,

$dayjob presented me with a request to break up the current tarball

into

two: one suitable for "users" and another for the Accumulo services.

The

ultimate goal is to make upgrade scenarios a bit easier by having

client

and server centric packaging.

The "client" tarball would be something suitable for most users
providing the ability to do things like:

* Launch a java app against Accumulo
* Launch a MapReduce job against Accumulo
* Launch the Accumulo shell

Essentially, the client tarball is just a pared down version of our
"current" tarball and the server-tarball is likely equivalent to our
"current" tarball (given that we have little code which would be
considered client-only).

Obviously, there are many ways to go about this. If there is buy-in

from

other folks, adding some new assembly descriptors and making it a part
of the Maven build (perhaps, optionally generated) would be the

easiest

in terms of maintenance. However, I don't want to push for that if

it's

just going to be ignored by folks. I'll be creating something to

support

this one way or another.

Any thoughts/opinions? Would this have any value to other folks?

- Josh









Re: [DISCUSS] Any interest in separate client/server tarballs

2018-01-05 Thread Josh Elser
One thing worth mentioning is that I will be doing this against 
$dayjob's 1.7 based branch to start.


If the consensus is to only do this for a 2.0 Accumulo release, perhaps 
I can use my work to seed that effort? I'm thinking something like a 
document that lists what would be in such a client-tarball.


On 1/5/18 11:35 AM, Keith Turner wrote:

On Fri, Jan 5, 2018 at 11:24 AM, Mike Walch <mwa...@apache.org> wrote:

I like the idea of client tarball.  I think it will make things easier for
users. However, I agree with Keith that we are going to need to split the
accumulo command into accumulo-client & accumulo-server.  I am interested
in helping out with this as I have done a lot of work on the scripts in 2.0.


2.0 would be a good time for disruptive script changes.

Could call client script accumulo and server script accumulo-server.
Just thinking the client script is used more often so shorter would be
nice.



On Thu, Jan 4, 2018 at 7:16 PM, Josh Elser <josh.el...@gmail.com> wrote:


Hi,

$dayjob presented me with a request to break up the current tarball into
two: one suitable for "users" and another for the Accumulo services. The
ultimate goal is to make upgrade scenarios a bit easier by having client
and server centric packaging.

The "client" tarball would be something suitable for most users providing
the ability to do things like:

* Launch a java app against Accumulo
* Launch a MapReduce job against Accumulo
* Launch the Accumulo shell

Essentially, the client tarball is just a pared down version of our
"current" tarball and the server-tarball is likely equivalent to our
"current" tarball (given that we have little code which would be considered
client-only).

Obviously, there are many ways to go about this. If there is buy-in from
other folks, adding some new assembly descriptors and making it a part of
the Maven build (perhaps, optionally generated) would be the easiest in
terms of maintenance. However, I don't want to push for that if it's just
going to be ignored by folks. I'll be creating something to support this
one way or another.

Any thoughts/opinions? Would this have any value to other folks?

- Josh



Re: [DISCUSS] Any interest in separate client/server tarballs

2018-01-05 Thread Josh Elser
I'd be worried about advertising something that we're not treating as 
official as it would languish (unless we create tests that can validate 
the result for us).


Thanks for the input.

On 1/4/18 7:43 PM, Christopher wrote:

tl;dr : I would prefer not to add another tarball as part of our "official"
releases, but I'd be in favor of a blog instructions, script, or build
profile, which users could read/execute/activate to create a client-centric
package.

I've long believed that supporting different downstream packaging scenarios
should be prioritized over upstream binary packaging. I have argued in
favor of removing our current tarball entirely, while supporting efforts to
enable downstream packaging by modularizing the server code, supporting a
client-API jar (future work), and decoupling code from launch scripts. I
think we should continue to do these kinds of improvements to support
different packaging scenarios downstream, but I'd prefer to avoid
additional "official" binary releases.

Rather than provide additional packages, I'd prefer to work with downstream
to make the source more "packagable" to suit the needs of these downstream
vendor/community packagers. One way we can do that here is by either
documenting what would be needed in a client-centric package, or by
providing a script or build profile to create it from source, so that your
$dayjob or any other downstream packager doesn't have to figure that out
from scratch.

On Thu, Jan 4, 2018 at 7:17 PM Josh Elser <josh.el...@gmail.com> wrote:


Hi,

$dayjob presented me with a request to break up the current tarball into
two: one suitable for "users" and another for the Accumulo services. The
ultimate goal is to make upgrade scenarios a bit easier by having client
and server centric packaging.

The "client" tarball would be something suitable for most users
providing the ability to do things like:

* Launch a java app against Accumulo
* Launch a MapReduce job against Accumulo
* Launch the Accumulo shell

Essentially, the client tarball is just a pared down version of our
"current" tarball and the server-tarball is likely equivalent to our
"current" tarball (given that we have little code which would be
considered client-only).

Obviously, there are many ways to go about this. If there is buy-in from
other folks, adding some new assembly descriptors and making it a part
of the Maven build (perhaps, optionally generated) would be the easiest
in terms of maintenance. However, I don't want to push for that if it's
just going to be ignored by folks. I'll be creating something to support
this one way or another.

Any thoughts/opinions? Would this have any value to other folks?

- Josh





Re: [DISCUSS] Any interest in separate client/server tarballs

2018-01-05 Thread Josh Elser

On 1/5/18 9:55 AM, Keith Turner wrote:

Obviously, there are many ways to go about this. If there is buy-in from
other folks, adding some new assembly descriptors and making it a part of
the Maven build (perhaps, optionally generated) would be the easiest in
terms of maintenance. However, I don't want to push for that if it's just
going to be ignored by folks. I'll be creating something to support this one
way or another.

Do you have anything to share?  I would be interested in reviewing this.


Nothing yet. My plan is to take the stock bin-tarball, split the files 
up into two lists to make sure I have the separation correct (that 
things actually work). Then, I can implement it however we want.



Any thoughts/opinions? Would this have any value to other folks?

This is slightly unrelated, but it would be nice to lower the number
of dependencies for the client side code and possibly shade in
libthrift.


Yup. Agreed.


[DISCUSS] Any interest in separate client/server tarballs

2018-01-04 Thread Josh Elser

Hi,

$dayjob presented me with a request to break up the current tarball into 
two: one suitable for "users" and another for the Accumulo services. The 
ultimate goal is to make upgrade scenarios a bit easier by having client 
and server centric packaging.


The "client" tarball would be something suitable for most users 
providing the ability to do things like:


* Launch a java app against Accumulo
* Launch a MapReduce job against Accumulo
* Launch the Accumulo shell

Essentially, the client tarball is just a pared down version of our 
"current" tarball and the server-tarball is likely equivalent to our 
"current" tarball (given that we have little code which would be 
considered client-only).


Obviously, there are many ways to go about this. If there is buy-in from 
other folks, adding some new assembly descriptors and making it a part 
of the Maven build (perhaps, optionally generated) would be the easiest 
in terms of maintenance. However, I don't want to push for that if it's 
just going to be ignored by folks. I'll be creating something to support 
this one way or another.


Any thoughts/opinions? Would this have any value to other folks?

- Josh


Re: Test replication

2018-01-04 Thread Josh Elser

You can configure a replication peer which is the "local" Accumulo instance.

I think there are some ITs which do this.

On 1/4/18 4:13 PM, Mike Miller wrote:

Trying to test a fix for the 2.0 Monitor
https://issues.apache.org/jira/browse/ACCUMULO-4760 and I wanted to enable
replication.  Does anyone know if there is a way to enable it running a
single Uno instance?  I just need to "turn it on" so I can see if the
Monitor is reporting correctly.



Re: [DISCUSS] Hadoop3 support target?

2017-12-06 Thread Josh Elser



On 12/5/17 6:43 PM, Christopher wrote:

I was wondering about Hadoop 3 shading and whether that would help us. It
would be really nice if it could, or if there was some other class path
solution that was easy.

I think there are two major issues in this thread. The first is the API
problems. The second is the Hadoop 3 support. They are related, but I think
quickly dealing with the API issues can clarify what our options are for
Hadoop 3.


In the spirit of trying to keep these issues separate (I think 
Christopher is correct)


https://github.com/apache/accumulo/pull/332

If we switch to using the new shaded jars from Hadoop, we can avoid 
coupling these issues at all. This comes with caveats as 3.0.0-beta1 is 
busted (https://issues.apache.org/jira/browse/HADOOP-15058). Building a 
3.0.1-SNAPSHOT locally and using that let me run all of the unit tests 
which is promising.


Going to kick off the ITs and see how they fare.


Re: [DISCUSS] Hadoop3 support target?

2017-12-06 Thread Josh Elser



On 12/6/17 2:06 PM, Christopher wrote:

On Wed, Dec 6, 2017 at 1:55 PM Keith Turner <ke...@deenlo.com> wrote:


On Wed, Dec 6, 2017 at 1:43 PM, Josh Elser <els...@apache.org> wrote:



On 12/6/17 12:17 PM, Keith Turner wrote:


On Wed, Dec 6, 2017 at 11:56 AM, Josh Elser<els...@apache.org>  wrote:


Maybe a difference in interpretation:

I was seeing 1a as being source-compatible still. My assumption was

that

"Deprecate ClientConfiguration" meant that it would remain in the
codebase
-- "replace" as in "replace expected user invocation", not removal of

the

old ClientConfiguration and addition of a new ClientConfig class.


Ok, if we deprecate ClientConfiguration, leave it in 2.0, and drop the
extends from ClientConfiguration in 2.0.  Then I am not sure what the
benefit of introducing the new ClientConfig type is?



I read this as leaving the extends in ClientConfiguration and dropping

that

in the new ClientConfig. Agree, I wouldn't see the point in changing the
parent class of ClientConfiguration (as that would break things).



I don't think we can leave ClientConfiguration as deprecated and
extending commons config in Accumulo 2.0.  This leaves commons config
1 in the API.

Personally I am not in favor of dropping ClientConfiguration in 2.0,
which is why I was in favor option b.



In the absence of any further input from others, I'll follow along with
whatever you and Josh can agree on. Although I lean towards option 1.a, I
don't feel strongly about either option. We can also do a vote if neither
of you is able (or willing) to convince the other of your preference.


I don't feel strongly enough either way to raise a stink. Color me 
surprised that Keith is the one to encourage quick removals from API :)


If he's OK with it, I'm fine with it. I was trying to err on the side of 
less breakage.


Re: [DISCUSS] Hadoop3 support target?

2017-12-06 Thread Josh Elser



On 12/6/17 12:17 PM, Keith Turner wrote:

On Wed, Dec 6, 2017 at 11:56 AM, Josh Elser<els...@apache.org>  wrote:

Maybe a difference in interpretation:

I was seeing 1a as being source-compatible still. My assumption was that
"Deprecate ClientConfiguration" meant that it would remain in the codebase
-- "replace" as in "replace expected user invocation", not removal of the
old ClientConfiguration and addition of a new ClientConfig class.

Ok, if we deprecate ClientConfiguration, leave it in 2.0, and drop the
extends from ClientConfiguration in 2.0.  Then I am not sure what the
benefit of introducing the new ClientConfig type is?


I read this as leaving the extends in ClientConfiguration and dropping 
that in the new ClientConfig. Agree, I wouldn't see the point in 
changing the parent class of ClientConfiguration (as that would break 
things).


Re: [DISCUSS] Hadoop3 support target?

2017-12-06 Thread Josh Elser

Maybe a difference in interpretation:

I was seeing 1a as being source-compatible still. My assumption was that 
"Deprecate ClientConfiguration" meant that it would remain in the 
codebase -- "replace" as in "replace expected user invocation", not 
removal of the old ClientConfiguration and addition of a new 
ClientConfig class.


On 12/6/17 11:29 AM, Keith Turner wrote:

On Wed, Dec 6, 2017 at 11:28 AM, Josh Elser <els...@apache.org> wrote:

1.a sounds better to me.


why?



A would be the ideal solution, I think B is the next best if A doesn't work.

I need to get the Hadoop3 compatibility fixed, so I'll be investigating the
Hadoop shaded artifacts this week.


On 12/5/17 6:43 PM, Christopher wrote:


I was wondering about Hadoop 3 shading and whether that would help us. It
would be really nice if it could, or if there was some other class path
solution that was easy.

I think there are two major issues in this thread. The first is the API
problems. The second is the Hadoop 3 support. They are related, but I
think
quickly dealing with the API issues can clarify what our options are for
Hadoop 3.




To fix the API, I would like to get consensus on proceeding with this
path:

1. Rename 1.8.2-SNAPSHOT to 1.9.0-SNAPSHOT and deprecate the existing
ZooKeeperInstance constructor which takes a Configuration
  a) Deprecate ClientConfiguration and replace with ClientConfig (or a
better name) which does not extend Configuration or have API leak
problems,
and add a new ZKI constructor for this
  b) Ignore extends for now, and drop it from ClientConfiguration in
2.0
with a break (can't deprecate superclass), and add new ZKI constructor for
more specific ClientConfiguration next to deprecated one
2. Drop deprecated stuff from 2.0 branch (and extends, if option 1.b was
chosen)
3. Plan a 1.9.0 release instead of 1.8.2

I prefer 1.a over 1.b, personally, but I've been tossing back and forth. I
would need input on which is best. There are pros and cons to both,
regarding churn, and source and binary compatibility.




Once we deal with the API, our options for Hadoop 3 become:

A. Use Hadoop 3 shaded artifacts or some other class path solution (such
as
getting lucky identifying a version of commons-beanutils that works for
both)
B. Shade in 1.9 with a breaking change
C. Create a 1.9 version named 2.0, so we can do a breaking change without
semver violation; shade in this version
D. Shade in the branch we're currently calling 2.0

I think we can defer that decision pending some further
investigation/experimentation into what works, and deal with it after
dealing with steps 1-3 above (but soon after, hopefully).



On Tue, Dec 5, 2017 at 3:58 PM Josh Elser <els...@apache.org> wrote:


Another potential suggestion I forgot about: we try to just move to the
Hadoop shaded artifacts. This would invalidate the need to do more, but
I have no idea how "battle-tested" those artifacts are.

On 12/5/17 3:52 PM, Keith Turner wrote:


If we do the following.

* Drop ZooKeeperInstance.ZooKeeperInstance(Configuration config)


method.


* Drop extends from ClientConfig
* Add a method ZooKeeperInstance.ZooKeeperInstance(ClientConfig
config)

Then this will not be binary compatible, so it will still be painful
in many cases.   It may be source compatible.

For example the following will be source (but not binary) compatible.

 ClientConfiguration cc = new ClientConfiguration(file);
 //when compiled against older version of Accumulo will bind to
method with commons config signature
 //when recompiled will bind to clientconfig version of method
 ZooKeeperInstance zki = new ZooKeeperInstance(cc);

The following would not be source or binary compatible.

 Configuration cc = new ClientConfiguration(file);
 ZooKeeperInstance zki = new ZooKeeperInstance(cc);


On Tue, Dec 5, 2017 at 3:40 PM, Josh Elser <els...@apache.org> wrote:




On 12/5/17 3:28 PM, Keith Turner wrote:



On Tue, Dec 5, 2017 at 2:53 PM, Josh Elser<els...@apache.org>  wrote:



Interesting. What makes you want to deprecate ClientConfig entirely?

I'd be worried about removing without sufficient thought of


replacement


around. It would be a bit "churn-y" to introduce yet another way that
clients have to connect (since it was introduced in 1.6-ish?).
Working
around the ClientConfig changes was irritating for the downstream
integrations (Hive, most notably).



Ok maybe thats a bad idea, not looking to cause pain.  Here were some
of my goals.

 * Remove commons config from API completely via deprecation cycle.
 * Introduce API that supports putting all props needed to connect
to
Accumulo in an API.

I suppose if we want to keep ClientConfig class in API, then there is
no way to remove commons config via a deprecation cycle??  We can't
deprecate the extension of commons config, all we can do is just drop
it at some point.



My line of thinking is 

Re: [DISCUSS] Hadoop3 support target?

2017-12-06 Thread Josh Elser

1.a sounds better to me.

A would be the ideal solution, I think B is the next best if A doesn't work.

I need to get the Hadoop3 compatibility fixed, so I'll be investigating 
the Hadoop shaded artifacts this week.


On 12/5/17 6:43 PM, Christopher wrote:

I was wondering about Hadoop 3 shading and whether that would help us. It
would be really nice if it could, or if there was some other class path
solution that was easy.

I think there are two major issues in this thread. The first is the API
problems. The second is the Hadoop 3 support. They are related, but I think
quickly dealing with the API issues can clarify what our options are for
Hadoop 3.




To fix the API, I would like to get consensus on proceeding with this path:

1. Rename 1.8.2-SNAPSHOT to 1.9.0-SNAPSHOT and deprecate the existing
ZooKeeperInstance constructor which takes a Configuration
 a) Deprecate ClientConfiguration and replace with ClientConfig (or a
better name) which does not extend Configuration or have API leak problems,
and add a new ZKI constructor for this
 b) Ignore extends for now, and drop it from ClientConfiguration in 2.0
with a break (can't deprecate superclass), and add new ZKI constructor for
more specific ClientConfiguration next to deprecated one
2. Drop deprecated stuff from 2.0 branch (and extends, if option 1.b was
chosen)
3. Plan a 1.9.0 release instead of 1.8.2

I prefer 1.a over 1.b, personally, but I've been tossing back and forth. I
would need input on which is best. There are pros and cons to both,
regarding churn, and source and binary compatibility.




Once we deal with the API, our options for Hadoop 3 become:

A. Use Hadoop 3 shaded artifacts or some other class path solution (such as
getting lucky identifying a version of commons-beanutils that works for
both)
B. Shade in 1.9 with a breaking change
C. Create a 1.9 version named 2.0, so we can do a breaking change without
semver violation; shade in this version
D. Shade in the branch we're currently calling 2.0

I think we can defer that decision pending some further
investigation/experimentation into what works, and deal with it after
dealing with steps 1-3 above (but soon after, hopefully).



On Tue, Dec 5, 2017 at 3:58 PM Josh Elser <els...@apache.org> wrote:


Another potential suggestion I forgot about: we try to just move to the
Hadoop shaded artifacts. This would invalidate the need to do more, but
I have no idea how "battle-tested" those artifacts are.

On 12/5/17 3:52 PM, Keith Turner wrote:

If we do the following.

   * Drop ZooKeeperInstance.ZooKeeperInstance(Configuration config)

method.

   * Drop extends from ClientConfig
   * Add a method ZooKeeperInstance.ZooKeeperInstance(ClientConfig config)

Then this will not be binary compatible, so it will still be painful
in many cases.   It may be source compatible.

For example the following will be source (but not binary) compatible.

ClientConfiguration cc = new ClientConfiguration(file);
//when compiled against older version of Accumulo will bind to
method with commons config signature
//when recompiled will bind to clientconfig version of method
ZooKeeperInstance zki = new ZooKeeperInstance(cc);

The following would not be source or binary compatible.

Configuration cc = new ClientConfiguration(file);
ZooKeeperInstance zki = new ZooKeeperInstance(cc);


On Tue, Dec 5, 2017 at 3:40 PM, Josh Elser <els...@apache.org> wrote:



On 12/5/17 3:28 PM, Keith Turner wrote:


On Tue, Dec 5, 2017 at 2:53 PM, Josh Elser<els...@apache.org>  wrote:


Interesting. What makes you want to deprecate ClientConfig entirely?

I'd be worried about removing without sufficient thought of

replacement

around. It would be a bit "churn-y" to introduce yet another way that
clients have to connect (since it was introduced in 1.6-ish?). Working
around the ClientConfig changes was irritating for the downstream
integrations (Hive, most notably).


Ok maybe thats a bad idea, not looking to cause pain.  Here were some
of my goals.

* Remove commons config from API completely via deprecation cycle.
* Introduce API that supports putting all props needed to connect to
Accumulo in an API.

I suppose if we want to keep ClientConfig class in API, then there is
no way to remove commons config via a deprecation cycle??  We can't
deprecate the extension of commons config, all we can do is just drop
it at some point.



My line of thinking is that the majority of the time, we're creating a
ClientConfiguration by one of:

* ClientConfiguration#loadDefault()
* new ClientConfiguration(String)
* new ClientConfiguration(File)

Granted, we also inherit/expose a few other things (notably extending
CompositeConfiguration and throwing ConfigurationException). I would be
comfortable with dropping those w/o deprecation. I have not seen

evidence

from anyone that they are widely in use by folks (although I've not
explicitly asked, either).






Re: [DISCUSS] Hadoop3 support target?

2017-12-05 Thread Josh Elser
Another potential suggestion I forgot about: we try to just move to the 
Hadoop shaded artifacts. This would invalidate the need to do more, but 
I have no idea how "battle-tested" those artifacts are.


On 12/5/17 3:52 PM, Keith Turner wrote:

If we do the following.

  * Drop ZooKeeperInstance.ZooKeeperInstance(Configuration config) method.
  * Drop extends from ClientConfig
  * Add a method ZooKeeperInstance.ZooKeeperInstance(ClientConfig config)

Then this will not be binary compatible, so it will still be painful
in many cases.   It may be source compatible.

For example the following will be source (but not binary) compatible.

   ClientConfiguration cc = new ClientConfiguration(file);
   //when compiled against older version of Accumulo will bind to
method with commons config signature
   //when recompiled will bind to clientconfig version of method
   ZooKeeperInstance zki = new ZooKeeperInstance(cc);

The following would not be source or binary compatible.

   Configuration cc = new ClientConfiguration(file);
   ZooKeeperInstance zki = new ZooKeeperInstance(cc);


On Tue, Dec 5, 2017 at 3:40 PM, Josh Elser <els...@apache.org> wrote:



On 12/5/17 3:28 PM, Keith Turner wrote:


On Tue, Dec 5, 2017 at 2:53 PM, Josh Elser<els...@apache.org>  wrote:


Interesting. What makes you want to deprecate ClientConfig entirely?

I'd be worried about removing without sufficient thought of replacement
around. It would be a bit "churn-y" to introduce yet another way that
clients have to connect (since it was introduced in 1.6-ish?). Working
around the ClientConfig changes was irritating for the downstream
integrations (Hive, most notably).


Ok maybe thats a bad idea, not looking to cause pain.  Here were some
of my goals.

   * Remove commons config from API completely via deprecation cycle.
   * Introduce API that supports putting all props needed to connect to
Accumulo in an API.

I suppose if we want to keep ClientConfig class in API, then there is
no way to remove commons config via a deprecation cycle??  We can't
deprecate the extension of commons config, all we can do is just drop
it at some point.



My line of thinking is that the majority of the time, we're creating a
ClientConfiguration by one of:

* ClientConfiguration#loadDefault()
* new ClientConfiguration(String)
* new ClientConfiguration(File)

Granted, we also inherit/expose a few other things (notably extending
CompositeConfiguration and throwing ConfigurationException). I would be
comfortable with dropping those w/o deprecation. I have not seen evidence
from anyone that they are widely in use by folks (although I've not
explicitly asked, either).


Re: [DISCUSS] Hadoop3 support target?

2017-12-05 Thread Josh Elser



On 12/5/17 3:28 PM, Keith Turner wrote:

On Tue, Dec 5, 2017 at 2:53 PM, Josh Elser<els...@apache.org>  wrote:

Interesting. What makes you want to deprecate ClientConfig entirely?

I'd be worried about removing without sufficient thought of replacement
around. It would be a bit "churn-y" to introduce yet another way that
clients have to connect (since it was introduced in 1.6-ish?). Working
around the ClientConfig changes was irritating for the downstream
integrations (Hive, most notably).

Ok maybe thats a bad idea, not looking to cause pain.  Here were some
of my goals.

  * Remove commons config from API completely via deprecation cycle.
  * Introduce API that supports putting all props needed to connect to
Accumulo in an API.

I suppose if we want to keep ClientConfig class in API, then there is
no way to remove commons config via a deprecation cycle??  We can't
deprecate the extension of commons config, all we can do is just drop
it at some point.



My line of thinking is that the majority of the time, we're creating a 
ClientConfiguration by one of:


* ClientConfiguration#loadDefault()
* new ClientConfiguration(String)
* new ClientConfiguration(File)

Granted, we also inherit/expose a few other things (notably extending 
CompositeConfiguration and throwing ConfigurationException). I would be 
comfortable with dropping those w/o deprecation. I have not seen 
evidence from anyone that they are widely in use by folks (although I've 
not explicitly asked, either).


Re: [DISCUSS] Hadoop3 support target?

2017-12-05 Thread Josh Elser

Interesting. What makes you want to deprecate ClientConfig entirely?

I'd be worried about removing without sufficient thought of replacement 
around. It would be a bit "churn-y" to introduce yet another way that 
clients have to connect (since it was introduced in 1.6-ish?). Working 
around the ClientConfig changes was irritating for the downstream 
integrations (Hive, most notably).


On 12/5/17 1:13 PM, Keith Turner wrote:

I was thinking of a slightly different path forward.

  * Add new entry point and deprecate clientconfig in 1.9
  * Branch 1.9 off 1.8
  * Stop releasing 1.8.x in favor of 1.9.x (they are the same except
for new API)
  * Release 1.9 ASAP
  * Drop clientconfig in 2.0.0
  * Release 2.0.0 early next year... maybe target March

On Tue, Dec 5, 2017 at 12:51 PM, Josh Elser <els...@apache.org> wrote:

Ok, a bridge version seems to be a general path forward. Generally this
would be...

* 1.8 gets relevant commons-config classes/methods deprecated
* 1.9 is 1.8 with those deprecation points removed
* 1.9 has commons-config shaded (maybe?)

IMO, it's critical that we remove the commons-config stuff from our public
API (shame this somehow was let in to begin).

I think shading our use of commons-config would be a good idea and lessen
our ClientConfiguration scope to being able to read from a file. Trying to
support the breadth of what commons-configuration can do will just get us
into more trouble.


On 12/5/17 12:18 PM, Keith Turner wrote:


If we are going to deprecate, then it would be nice to have a
replacement.  One thing that has irked me about the current Accumulo
entry point is that one can not specify everything needed to connect
to in a single props file.  Specifically, credentials can not be
specified.  It would be really nice to have a new entry point that
allows this.

We could release a 1.9 bridge version.  This version would be based on
1.8 and only include a new entry point. Base it on 1.8 in order to
allow a low risk upgrade for anyone currently using 1.8.  Once people
start using 1.9 they can have code that uses the old and new entry
point running at the same time.  In 2.0 we can drop the problematic
entry point.

Below is a commit to 1.8 where I was experimenting with a new entry point.


https://github.com/keith-turner/accumulo/commit/1c07fa62e9c57bde7e60907595d50f898d03c9d5

This new API would need review, its rough and there are some things I
don't like about it.  Just sharing for discussion of general concept,
not advocating for this specific API.

On Mon, Dec 4, 2017 at 6:27 PM, Dave Marion <dmario...@gmail.com> wrote:


There is no reason that you can't mark the offending API methods as
deprecated in a 1.8.x release, then immediately branch off of that to create
a 2.0 and remove the method. Alternatively, we could decide to forego the
semver rules for a specific release and make sure to point it out in the
release notes.

-Original Message-
From: Josh Elser [mailto:els...@apache.org]
Sent: Monday, December 4, 2017 6:19 PM
To: dev@accumulo.apache.org
Subject: Re: [DISCUSS] Hadoop3 support target?

Also, just to be clear for everyone else:

This means that we have *no roadmap* at all for Hadoop 3 support because
Accumulo 2.0 is in a state of languish.

This is a severe enough problem to me that I would consider breaking API
compatibility and fixing the API leak in 1.7/1.8. I'm curious what people
other than Christopher think (assuming from his comments/JIRA work that he
disagrees with me).

On 12/4/17 6:12 PM, Christopher wrote:


Agreed.

On Mon, Dec 4, 2017 at 6:01 PM Josh Elser <els...@apache.org> wrote:


Ah, I'm seeing now -- didn't check my inbox appropriately.

I think the fact that code that we don't own has somehow been allowed
to be public API is the smell. That's something that needs to be
rectified sooner than later. By that measure, it can *only* land on
Accumulo 2.0 (which is going to be a major issue for the project).

On 12/4/17 5:58 PM, Josh Elser wrote:


Sorry, I don't follow. Why do you think 4611/4753 is a show-stopper?
Cuz, uh... I made it work already :)

Thanks for the JIRA cleanup. Forgot about that one.

On 12/4/17 5:55 PM, Christopher wrote:


I don't think we can support it with 1.8 or earlier, because of
some serious incompatibilities (namely, ACCUMULO-4611/4753) I think
people are still patching 1.7, so I don't think we've "officially"
EOL'd it.
I think 2.0 could require Hadoop 3, if Hadoop 3 is sufficiently
stable.

On Mon, Dec 4, 2017 at 1:14 PM Josh Elser <els...@apache.org> wrote:


What branch do we want to consider Hadoop3 support?

There is a 3.0.0-beta1 release that's been out for a while, and
Hadoop PMC has already done a 3.0.0 RC0. I think it's the right
time to start considering this.

In my poking so far, I've filed ACCUMULO-4753 which I'm working
through now. This does raise the question: where do we want to say
we support Hadoop3? 1.8 or 2.0? (have we "officially" de

Re: [DISCUSS] Hadoop3 support target?

2017-12-05 Thread Josh Elser
Ok, a bridge version seems to be a general path forward. Generally this 
would be...


* 1.8 gets relevant commons-config classes/methods deprecated
* 1.9 is 1.8 with those deprecation points removed
* 1.9 has commons-config shaded (maybe?)

IMO, it's critical that we remove the commons-config stuff from our 
public API (shame this somehow was let in to begin).


I think shading our use of commons-config would be a good idea and 
lessen our ClientConfiguration scope to being able to read from a file. 
Trying to support the breadth of what commons-configuration can do will 
just get us into more trouble.


On 12/5/17 12:18 PM, Keith Turner wrote:

If we are going to deprecate, then it would be nice to have a
replacement.  One thing that has irked me about the current Accumulo
entry point is that one can not specify everything needed to connect
to in a single props file.  Specifically, credentials can not be
specified.  It would be really nice to have a new entry point that
allows this.

We could release a 1.9 bridge version.  This version would be based on
1.8 and only include a new entry point. Base it on 1.8 in order to
allow a low risk upgrade for anyone currently using 1.8.  Once people
start using 1.9 they can have code that uses the old and new entry
point running at the same time.  In 2.0 we can drop the problematic
entry point.

Below is a commit to 1.8 where I was experimenting with a new entry point.

https://github.com/keith-turner/accumulo/commit/1c07fa62e9c57bde7e60907595d50f898d03c9d5

This new API would need review, its rough and there are some things I
don't like about it.  Just sharing for discussion of general concept,
not advocating for this specific API.

On Mon, Dec 4, 2017 at 6:27 PM, Dave Marion <dmario...@gmail.com> wrote:

There is no reason that you can't mark the offending API methods as deprecated 
in a 1.8.x release, then immediately branch off of that to create a 2.0 and 
remove the method. Alternatively, we could decide to forego the semver rules 
for a specific release and make sure to point it out in the release notes.

-Original Message-
From: Josh Elser [mailto:els...@apache.org]
Sent: Monday, December 4, 2017 6:19 PM
To: dev@accumulo.apache.org
Subject: Re: [DISCUSS] Hadoop3 support target?

Also, just to be clear for everyone else:

This means that we have *no roadmap* at all for Hadoop 3 support because 
Accumulo 2.0 is in a state of languish.

This is a severe enough problem to me that I would consider breaking API 
compatibility and fixing the API leak in 1.7/1.8. I'm curious what people other 
than Christopher think (assuming from his comments/JIRA work that he disagrees 
with me).

On 12/4/17 6:12 PM, Christopher wrote:

Agreed.

On Mon, Dec 4, 2017 at 6:01 PM Josh Elser <els...@apache.org> wrote:


Ah, I'm seeing now -- didn't check my inbox appropriately.

I think the fact that code that we don't own has somehow been allowed
to be public API is the smell. That's something that needs to be
rectified sooner than later. By that measure, it can *only* land on
Accumulo 2.0 (which is going to be a major issue for the project).

On 12/4/17 5:58 PM, Josh Elser wrote:

Sorry, I don't follow. Why do you think 4611/4753 is a show-stopper?
Cuz, uh... I made it work already :)

Thanks for the JIRA cleanup. Forgot about that one.

On 12/4/17 5:55 PM, Christopher wrote:

I don't think we can support it with 1.8 or earlier, because of
some serious incompatibilities (namely, ACCUMULO-4611/4753) I think
people are still patching 1.7, so I don't think we've "officially"
EOL'd it.
I think 2.0 could require Hadoop 3, if Hadoop 3 is sufficiently stable.

On Mon, Dec 4, 2017 at 1:14 PM Josh Elser <els...@apache.org> wrote:


What branch do we want to consider Hadoop3 support?

There is a 3.0.0-beta1 release that's been out for a while, and
Hadoop PMC has already done a 3.0.0 RC0. I think it's the right
time to start considering this.

In my poking so far, I've filed ACCUMULO-4753 which I'm working
through now. This does raise the question: where do we want to say
we support Hadoop3? 1.8 or 2.0? (have we "officially" deprecated
1.7?)

- Josh

https://issues.apache.org/jira/browse/ACCUMULO-4753











Re: [DISCUSS] Hadoop3 support target?

2017-12-04 Thread Josh Elser

Also, just to be clear for everyone else:

This means that we have *no roadmap* at all for Hadoop 3 support because 
Accumulo 2.0 is in a state of languish.


This is a severe enough problem to me that I would consider breaking API 
compatibility and fixing the API leak in 1.7/1.8. I'm curious what 
people other than Christopher think (assuming from his comments/JIRA 
work that he disagrees with me).


On 12/4/17 6:12 PM, Christopher wrote:

Agreed.

On Mon, Dec 4, 2017 at 6:01 PM Josh Elser <els...@apache.org> wrote:


Ah, I'm seeing now -- didn't check my inbox appropriately.

I think the fact that code that we don't own has somehow been allowed to
be public API is the smell. That's something that needs to be rectified
sooner than later. By that measure, it can *only* land on Accumulo 2.0
(which is going to be a major issue for the project).

On 12/4/17 5:58 PM, Josh Elser wrote:

Sorry, I don't follow. Why do you think 4611/4753 is a show-stopper?
Cuz, uh... I made it work already :)

Thanks for the JIRA cleanup. Forgot about that one.

On 12/4/17 5:55 PM, Christopher wrote:

I don't think we can support it with 1.8 or earlier, because of some
serious incompatibilities (namely, ACCUMULO-4611/4753)
I think people are still patching 1.7, so I don't think we've
"officially"
EOL'd it.
I think 2.0 could require Hadoop 3, if Hadoop 3 is sufficiently stable.

On Mon, Dec 4, 2017 at 1:14 PM Josh Elser <els...@apache.org> wrote:


What branch do we want to consider Hadoop3 support?

There is a 3.0.0-beta1 release that's been out for a while, and Hadoop
PMC has already done a 3.0.0 RC0. I think it's the right time to start
considering this.

In my poking so far, I've filed ACCUMULO-4753 which I'm working through
now. This does raise the question: where do we want to say we support
Hadoop3? 1.8 or 2.0? (have we "officially" deprecated 1.7?)

- Josh

https://issues.apache.org/jira/browse/ACCUMULO-4753









Re: [DISCUSS] Hadoop3 support target?

2017-12-04 Thread Josh Elser

Ah, I'm seeing now -- didn't check my inbox appropriately.

I think the fact that code that we don't own has somehow been allowed to 
be public API is the smell. That's something that needs to be rectified 
sooner than later. By that measure, it can *only* land on Accumulo 2.0 
(which is going to be a major issue for the project).


On 12/4/17 5:58 PM, Josh Elser wrote:
Sorry, I don't follow. Why do you think 4611/4753 is a show-stopper? 
Cuz, uh... I made it work already :)


Thanks for the JIRA cleanup. Forgot about that one.

On 12/4/17 5:55 PM, Christopher wrote:

I don't think we can support it with 1.8 or earlier, because of some
serious incompatibilities (namely, ACCUMULO-4611/4753)
I think people are still patching 1.7, so I don't think we've 
"officially"

EOL'd it.
I think 2.0 could require Hadoop 3, if Hadoop 3 is sufficiently stable.

On Mon, Dec 4, 2017 at 1:14 PM Josh Elser <els...@apache.org> wrote:


What branch do we want to consider Hadoop3 support?

There is a 3.0.0-beta1 release that's been out for a while, and Hadoop
PMC has already done a 3.0.0 RC0. I think it's the right time to start
considering this.

In my poking so far, I've filed ACCUMULO-4753 which I'm working through
now. This does raise the question: where do we want to say we support
Hadoop3? 1.8 or 2.0? (have we "officially" deprecated 1.7?)

- Josh

https://issues.apache.org/jira/browse/ACCUMULO-4753





Re: [DISCUSS] Hadoop3 support target?

2017-12-04 Thread Josh Elser
Sorry, I don't follow. Why do you think 4611/4753 is a show-stopper? 
Cuz, uh... I made it work already :)


Thanks for the JIRA cleanup. Forgot about that one.

On 12/4/17 5:55 PM, Christopher wrote:

I don't think we can support it with 1.8 or earlier, because of some
serious incompatibilities (namely, ACCUMULO-4611/4753)
I think people are still patching 1.7, so I don't think we've "officially"
EOL'd it.
I think 2.0 could require Hadoop 3, if Hadoop 3 is sufficiently stable.

On Mon, Dec 4, 2017 at 1:14 PM Josh Elser <els...@apache.org> wrote:


What branch do we want to consider Hadoop3 support?

There is a 3.0.0-beta1 release that's been out for a while, and Hadoop
PMC has already done a 3.0.0 RC0. I think it's the right time to start
considering this.

In my poking so far, I've filed ACCUMULO-4753 which I'm working through
now. This does raise the question: where do we want to say we support
Hadoop3? 1.8 or 2.0? (have we "officially" deprecated 1.7?)

- Josh

https://issues.apache.org/jira/browse/ACCUMULO-4753





[DISCUSS] Hadoop3 support target?

2017-12-04 Thread Josh Elser

What branch do we want to consider Hadoop3 support?

There is a 3.0.0-beta1 release that's been out for a while, and Hadoop 
PMC has already done a 3.0.0 RC0. I think it's the right time to start 
considering this.


In my poking so far, I've filed ACCUMULO-4753 which I'm working through 
now. This does raise the question: where do we want to say we support 
Hadoop3? 1.8 or 2.0? (have we "officially" deprecated 1.7?)


- Josh

https://issues.apache.org/jira/browse/ACCUMULO-4753


Re: [DISCUSS] Moving away from Thrift

2017-11-17 Thread Josh Elser



On 11/17/17 10:32 AM, Christopher wrote:

On Fri, Nov 17, 2017 at 8:21 AM Josh Elser<els...@apache.org>  wrote:


Did you offer to make the release? See me with commons-vfs a time back.



The current issue with Thrift is not the point. The problems we've
encountered with Thrift were was provided as background context only.


I seriously think you are avoiding all of the good that Thrift provides 
us for the sake of a platform to discuss your distaste. Take a look at 
the amount of code that makes up Hadoop's or HBase's RPC implementations 
and the corresponding (often nasty) bugs that have come up over the years.


There are numerous things which Thrift continues to do very well that 
have never become problems for us in Accumulo. Having seen the other 
side of the fence, I would happily take Thrift (warts and all) any day 
over the alternatives.



Your proposal seems to me like you're blowing the situation out of
proportion.



I haven't proposed we do anything beyond "consider" or "discuss". I don't
think "consider" or "discuss" are "out of proportion", even if Thrift had
zero problems.


/me blinks. Ok then.


Re: [DISCUSS] Moving away from Thrift

2017-11-17 Thread Josh Elser
Did you offer to make the release? See me with commons-vfs a time back.

Your proposal seems to me like you're blowing the situation out of
proportion.

On Nov 16, 2017 23:58, "Christopher"  wrote:

> The current Thrift issue has already been fixed with a patch. Their PMC
> needs to release it, though.
>
> Following ASF's commitment to "community over code", I think it would be
> inappropriate for an Apache project to fork another active project while
> that community still exists. It's better to work with them if we can, and
> to use another dependency if we can't. There may be ASF policy against such
> forking, but that may only apply to forking non-ASF projects. In any case,
> I don't think it's a good idea.
>
> Also, even if we are able to resolve the current issue of releasing a
> version without the spammy print statement, I think there's value in
> discussing possible alternatives and their pros/cons. There's no timeline
> for this. Consider this an open-ended discussion regarding RPC
> alternatives. I just want to gather those alternatives into one place to
> discuss.
>
>
> On Thu, Nov 16, 2017 at 11:43 PM Ed Coleman  wrote:
>
> > Have we tried fixing the current issue and then submitting a
> pull-request?
> >
> > I'd favor first submitting a pull request and any other help that we can
> > provide to get it adopted and released soon - failing that we could fork
> > the project and go from there. That could offer us a path to correct the
> > immediate issue and offer time to consider other alternatives.
> >
> > Ed Coleman
> >
> > -Original Message-
> > From: Christopher [mailto:ctubb...@apache.org]
> > Sent: Thursday, November 16, 2017 11:36 PM
> > To: accumulo-dev 
> > Subject: [DISCUSS] Moving away from Thrift
> >
> > Accumulo Devs,
> >
> > I think it's time we start seriously thinking about moving away from
> > Thrift and considering alternatives.
> > For me, https://issues.apache.org/jira/browse/THRIFT-4062 is becoming
> the
> > last straw.
> >
> > Thrift is a neat idea, but to be blunt: there seems to be a fundamental
> > lack of care or interest from the Thrift developers at the current
> moment.
> >
> > Some of the problems we've seen over the years: Every version is
> > fundamentally incompatible with other versions. Repeated flip-flopping
> > regressions seems to occur with each release. Fundamental design concepts
> > like distinguishing server-side exceptions (TApplicationException vs.
> > TException) are undermined without consideration of the initial design.
> > And now, a serious bug (a spammy debugging print statement) was left in
> for
> > nearly a year now (still exists in current version), and no response from
> > the PMC to indicate any willingness to release a fix. Repeated requests
> to
> > the developer list has gone ignored. And, I'm not even counting my
> requests
> > for assistance debugging a compiler issue on s390x arch having also gone
> > ignored.
> >
> > These problems are not exclusive to Accumulo. Many of these are problems
> > that Cassandra has also faced, and I'm sure there are others.
> >
> > It's possible that Thrift can remedy the situation. None of these
> problems
> > are insurmountable, and none of them are beyond fixes, particularly if we
> > can afford to volunteer more to help out. My intention is not to throw a
> > fellow Apache project under the bus, and I do not intend to give up
> > reporting bugs, and contributing patches to Thrift where appropriate.
> But,
> > I think we also need to think realistically, and consider alternatives,
> if
> > Thrift development does not go in a direction which is favorable to
> > Accumulo.
> >
> > So, with that in mind, any suggestions for alternatives? With pros/cons?
> >
> >
>


Re: review board

2017-11-01 Thread Josh Elser

Hey Mark,

Yup, we're still a CTR project. That should be captured on the website 
on our governance page and would require a VOTE by the PMC to change.


We don't have any enforced means of mechanism to perform reviews. We 
used to use Reviewboard a bit, but, as of late, more happens on Github 
with the better integration that Infra has provided. For example, you'll 
find that some projects expressly state certain systems as the ones that 
must be used for code-review. It's not been an issue in Accumulo.


Re: CTR in practice, we do still have a bit of review happening before 
commit -- it's up to the discretion of the committer. If it's not a 
trivial change, you'll likely see the committer waiting for someone else 
to take a look before pushing it. Low-volume and decent test coverage 
helps make this a tenable process.


On 11/1/17 12:28 PM, J. Mark Owens wrote:

Hi,

I'm going through a lot of the Accumulo documentation as I look at 
ACCUMULO-4714 and had a question about some of the information.


Is the review board documentation page still up to date and accurate? I 
clicked the instance link  (https://reviews.apache.org/ ) and noticed 
that the last entry for Accumulo is over a year old. Is this something 
that is still actively utilized or should the information be revised in 
some manner? Is Accumulo still using a Commit-Then-Review policy, etc?


Thanks,
Mark


Re: KerberosToken hell

2017-10-27 Thread Josh Elser
Re #1: You don't actually need to do this unless you've disallowed 
anonymous connections to Zookeeper. Anonymous access to ZK is sufficient 
for Accumulo clients.


Have you made any effort to find existing code in the Accumulo 
repository? For example [1].


The KerberosToken is nothing other than a thin object which is 
ultimately stating that Kerberos credentials are intended to be used for 
authentication. Accumulo provides no API for the acquisition or local 
storage of those credentials -- thus, it's not suitable that Accumulo 
provides API to do this.


[1] 
https://github.com/apache/accumulo/blob/f81a8ec7410e789d11941351d5899b8894c6a322/test/src/main/java/org/apache/accumulo/test/functional/KerberosIT.java#L158-L177



On 10/27/17 1:58 PM, Jorge Machado wrote:

So how is the best way to Get an accumulo connector if the cluster is 
kerberized ?
I have done the following:
1- add a jaas.conf for zookeeper
2 create an instance (that logs in via sasl into zookeeper)
3- generate an AuthToken from KerberosToken class which logs the user in the 
ugi in but keeps the object on KerberosToken class
3 UserGroupInformation.loginwithkeythat(...,...) - this is needed because the 
thrift Client server just get’s the user from ugi but it is not there(because 
KerberosToken keeps the state)
4 - get the connector and passing the token

Would be nicer that we let the state on the ugi instead of the KerberosToken

We could create a public method from KerberosToken that logs the user in via 
ugi
what you mean with side effects ?


Jorge Machado
jo...@jmachado.me<mailto:jo...@jmachado.me>


Am 27.10.2017 um 18:31 schrieb Josh Elser 
<els...@apache.org<mailto:els...@apache.org>>:

Nearly all components in the Hadoop ecosystem require you to perform a login 
with your credentials when writing Java code.

The only exception I'm aware of is ZooKeeper which can automatically perform a 
login via JAAS.

Supporting automatic login via JAAS would be the best path forward here. 
Creating unique side-effects around security credentials in Accumulo is a bad 
idea (which is why the method you're referring to on KerberosToken was marked 
as Deprecated so that we eventually remove it).

On 10/27/17 12:06 PM, Jorge Machado wrote:
Hi Guys,
I just started developing a Accumulo Client with Kerberos and sasl. It was a 
hell to figure out that you need to call yourself  
UserGroupInformation.loginfromkeytab(principal,keytab) and then you can call
  KerberosToken(principal,keytab) this all because we deprecated the 
replaceuser from the ugi. Later on when we get the connector this breaks apart 
mainly because for example my keytab is has not the same user as the os account 
where I’m developing.
It would be nice to just login the user. What are the are you guys thinking 
about this ?
Regards
Jorge



Re: Unable to drop the table

2017-10-23 Thread Josh Elser
Please inspect the Accumulo Master log running on the host identified by 
the IP address in the warning message. Look for any Exceptions or ERROR 
messages reported in that log file.


On 10/23/17 12:57 PM, raviteja@gmail.com wrote:

I am getting an error whenever I try to drop the table.

[impl.ThriftTransportPool] WARN : Thread "shell" stuck on IO to
ip-Xinternal: (0) for at least 120038 ms



--
Sent from: http://apache-accumulo.1065345.n5.nabble.com/Developers-f3.html



Re: Draft Board Report for Oct 2017

2017-10-10 Thread Josh Elser

Need to strike the "-Description goes here-".

Otherwise, pretty dry report, but I guess there really wasn't much to 
report either.


On 10/9/17 9:58 AM, Michael Wall wrote:

The Apache Accumulo PMC decided to draft its quarterly board
reports on the dev list. Here is a draft of our report which is due by
Wednesday, Oct 11. Please let me know if you have any suggestions,
I plan to submit on the 11th.

Mike

--

## Description:
  - Description goes here- The Apache Accumulo sorted, distributed key/value
store is a robust,
scalable, high performance data storage system that features cell-based
access
control and customizable server-side processing.  It is based on Google's
BigTable design and is built on top of Apache Hadoop, Zookeeper, and Thrift.


## Issues:
  - There are no issues requiring board attention at this time.

## Activity:
  - There were no new releases during the current reporting period.
  - The 4th annual Accumulo Summit will be held on Oct 16th in Columbia,
MD.  The
PMC has approved the use of the Apache Accumulo trademark.

## Health report:
  - The project remains healthy.  Activity levels on mailing lists, git and
JIRA remain constant.

## PMC changes:
  - Currently 30 PMC members.
  - Ivan Bella was added to the PMC on Tue Jul 11 2017

## Committer base changes:
  - Currently 30 committers.
  - Ivan Bella was added as a committer on Wed Jul 12 2017

## Releases:
  - Last release was 1.7.3 on Sat Mar 25 2017

## Mailing list activity:
  - Nothing significant in the figures

## JIRA activity:
  - 43 JIRA tickets created in the last 3 months
  - 36 JIRA tickets closed/resolved in the last 3 months



Re: [DISCUSS] Guava Dependencies

2017-09-18 Thread Josh Elser



On 9/18/17 2:12 PM, Mike Miller wrote:

Recently tickets have been opened dealing with Guava in Accumulo (see
ACCUMULO-4701 through 4704), in particular the use of Beta classes and
methods.  Use of Guava comes with a few warnings...

 From the Guava README:

*1. APIs marked with the @Beta annotation at the class or method level are
subject to change. They can be modified in any way, or even removed, at any
time. If your code is a library itself (i.e. it is used on the CLASSPATH of
users outside your own control), you should not use beta APIs, unless you
repackage them (e.g. using ProGuard).2.Deprecated non-beta APIs will be
removed two years after the release in which they are first deprecated. You
must fix your references before this time. If you don't, any manner of
breakage could result (you are not guaranteed a compilation error).*

I think it is worth a discussion on how to handle Guava dependencies going
forward across the different versions of Accumulo.  The goal would be to
allow use of a newer version version of Guava in client applications with
the current supported versions of Accumulo.

Ideally, we could just eliminate any use of Beta Guava code.  But there are
Beta classes that are very useful and some which we already have integrated
into released Accumulo versions.

There seem to be 3 ways to handle Guava dependencies:
1 - jar shading


+1 (favoring #3 too, when not intrusive). We stop "advertising" that we 
include Guava on the classpath and it's no longer our problem. As the 
other part of the thread alludes, if Hadoop brings in a version, fine. 
Accumulo specifically should stop trying to rely on something specific 
coming down from its dependencies and "control its own destiny".


FWIW, HBase has been actively moving to this model and, IMO, it's been 
working well.



2 - copy Guava code into Accumulo
3 - replace Guava code with standard Java

We may have to handle it differently with each version of Accumulo.  For
example, 1.8 has more widespread use of Beta annotated code than 1.7.



Re: [DISCUSS] 1.8.2

2017-09-01 Thread Josh Elser
Given my current understanding (captured in my most recent comment), I 
don't think it's a blocker. It doesn't cause any incorrectness in the 
system, just unnecessary work in a rare case (active master switches)


If Mike has the time to dig into it some more, vetting some of the cases 
that I outlined wouldn't be a bad idea, but it's not a release blocker.


On 8/31/17 11:04 PM, Christopher wrote:

https://github.com/apache/accumulo/pull/295 is likely a blocker bug, but I
don't really know the full implications of the breakage to the replication
system. It is currently marked under Mike Miller's ACCUMULO-4662, rather
than a separate issue.

On Thu, Aug 31, 2017 at 9:57 AM Michael Wall  wrote:


You are correct Mike, my mistake.  I was looking at
https://issues.apache.org/jira/projects/ACCUMULO/versions/12339245.  Click
the "issues in progress".  Thanks for keeping me honest.

On Thu, Aug 31, 2017 at 9:46 AM Mike Miller 
wrote:


The only one I have open for 1.8.2 is
https://issues.apache.org/jira/browse/ACCUMULO-4662. I will look around
for
any more spots in the code that need to be fixed but I think its pretty
much done.

Was this the other ticket you were talking about Mike?
https://issues.apache.org/jira/browse/ACCUMULO-4342. Its currently
assigned
to you.

On Thu, Aug 31, 2017 at 9:17 AM, Michael Wall  wrote:


Mike Miller has 2 tickets in progress and the issue Keith mention is

the

only blocker I saw.   Once those are complete, I am in favor of a 1.8.2
release.  I am happy to do the release again and continue as the 1.8
release manager.  I am also happy to help someone else do that.  It is

a

patch release, but we typically still run the continuous ingest

testing.

Christopher, do we still have resources to do that?

On Wed, Aug 30, 2017 at 5:34 PM Keith Turner  wrote:


I Am in favor of that after I finish fixing ACCUMULO-4669

On Wed, Aug 30, 2017 at 2:16 PM, ivan bella 

wrote:

Is it time to consider talking about tagging a 1.8.2 release?












Re: [DISCUSS] GitBox

2017-08-18 Thread Josh Elser

Ok, cool. Thanks for the clarification and sorry for the ignorance!

+0

On 8/18/17 10:49 PM, Christopher wrote:

Enabling GH issues is not automatic and would not accompany this change. We
would have to explicitly request that, separately, if we want to do that in
the future.

On Fri, Aug 18, 2017 at 10:30 PM Josh Elser <els...@apache.org> wrote:


My biggest concern was the confusion around the enabling of GH issues
that would accompany this.

As long as we're not trying to do project management in two places
concurrently, I don't care either way.

On 8/18/17 4:51 PM, Mike Drob wrote:

What has changed about the state of Accumulo or GitBox since the last

time

we had this discussion? Not saying no here, curious as to why you think

we

should revisit though.

On Fri, Aug 18, 2017 at 3:36 PM, Mike Walch <mwa...@apache.org> wrote:


I think we should revisit the discussion of using Apache GitBox for
Accumulo. If you are unfamiliar with it, GitBox enables better GitHub
integration for Apache projects. With GitBox, committers can label

GitHub

pull requests, squash and merge them using the GitHub UI, and close

them if

they become stale. I think a move to GitBox will help us do a better

job of

reviewing and merging pull requests so that contributions are looked at

in

a timely manner. The only downside to this move is that the git url for
Accumulo will change.

Does anyone have objections to this?









  1   2   3   4   5   6   7   8   9   10   >