Re: [DISCUSS] Time for 1.7.1?

2016-02-07 Thread Josh Elser

I triaged tickets on 1.7.1 over Xmas (along with 1.6.5 for that matter).

JIRA should be the list of what needs to happen. I've been working on 
automating CI, too, otherwise I'd probably have an RC already put up.


Christopher wrote:

I think 1.6.5 and 1.7.1 are both ready to get released. I don't see any
blockers or critical issues.
ACCUMULO-4004 can be bumped to 1.6.6 if it's not ready.
ACCUMULO-4061 can also be bumped.
I would like to see if Keith can wrap up ACCUMULO-4066, though. If not, I
think whatever's remaining can be put in a new ticket, so we can close that
one.

On Sat, Feb 6, 2016 at 11:47 PM Sean Busbey  wrote:


Hiya!

How do folks feel about a 1.7.1 RC?

We're past 8 months since 1.7.0 and up to 118 fixes that are in 1.7.1 and
not the 1.6.1-1.6.4 releases.

Is there anything that must get in before we have a release candidate?

--
Sean





Benchmark project on region location caching from HBase

2016-02-02 Thread Josh Elser
(apparently this is my thing now -- post interesting snippets from other 
projects)


https://github.com/elliottneilclark/benchmark-hbase-cache

tl;dr hbase friends found that using a COW-Map implementation was a good 
fit for performance (over Concurrent* data structures).


A glance shows us primarily doing a RWLock with a plain-jane TreeMap. 
Would be a neat experiment to repro (perhaps after measuring how much 
time we actually spend doing tablet loc lookups)


Re: could not start the service

2016-02-02 Thread Josh Elser
See 
http://accumulo.apache.org/1.7/accumulo_user_manual.html#_cluster_specification 
and 
http://accumulo.apache.org/1.7/accumulo_user_manual.html#_hostnames_in_configuration_files


Accumulo expects the following files ("masters", "monitor", "tracers", 
and "slaves") to be present in /usr/lib/accumulo/conf/. Each line should 
be a hostname for which you want to run the corresponding Accumulo 
service on. "slaves" == TabletServers


gunjanindia wrote:

While running accumulo init command i am getting the following error
could be some error in accumulo-config.xml
[accumulo@lumify ~]$ /usr/lib/accumulo/bin/accumulo init
*Could not infer a Monitor role. You need to either define the MONITOR env
variable,* define "/usr/lib/accumulo/conf/monitor", or make sure
"/usr/lib/accumulo/conf/masters" is non-empty.
please guide me

regards
Gunjan Kumar



--
View this message in context: 
http://apache-accumulo.1065345.n5.nabble.com/could-not-start-the-service-tp16000.html
Sent from the Developers mailing list archive at Nabble.com.


Fwd: [announce] thrift-tools

2016-02-01 Thread Josh Elser

ICYMI

 Original Message 
Subject: [announce] thrift-tools
Date: Thu, 28 Jan 2016 19:43:47 -0800
From: Raúl Gutiérrez Segalés 
Reply-To: u...@thrift.apache.org
To: u...@thrift.apache.org

Hi,

Given I've seen a few threads about debugging Thrift messages, this might
be of general interest:

https://github.com/pinterest/thrift-tools

thrift-tools is a library and a set of tools to introspect Thrift traffic.


Cheers,
-rgs



Re: Automating deploy+testing

2016-01-26 Thread Josh Elser

Thanks Keith. I had forgotten about fats.

I'll take a look at what you had done there as well.

Keith Turner wrote:

I created fats[1] which relies on Fluo Deploy to automatically setup
Accumulo on EC2 and has some scripts for Accumulo testing.  Not sure if
this is still working (since Fluo Deploy has changed a lot since I wrote
fats).

I will take a look at what you did.

https://github.com/keith-turner/fats

On Tue, Jan 26, 2016 at 12:38 AM, Josh Elser<josh.el...@gmail.com>  wrote:


FYI, my evening hack plans this week are going to be centered around
creating some automation around building Accumulo, installing it on some
nodes and then automatically running Continuous Ingest/Randomwalk.

Looking back now, I can't hardly believe I'e gone this long without
writing something.

If anyone else is thinking about this for the upcoming 1.6.5 and 1.7.1
testing, I'd be happy to work together. The approach should be agnostic on
platform and bootstraps Accumulo by hand (SSH, bash, and python) with
arguments (selfishly defaulting to things useful to me). I presently have
it pulling down some Git ref, building and installing the tarball. Need to
get configuration, copying the install/configs to many nodes, running CI
and reporting some results at the end.

https://github.com/joshelser/accumulo-automated-testing

- Josh





Re: Automating deploy+testing

2016-01-26 Thread Josh Elser
Thanks Dylan. I think I played around with Mike's code too when the book 
was still being written.


Multi-node is definitely a necessity, so I don't think Maven makes a 
good driver (Bash and some scripting language seems to fit nicely). I'm 
hoping to just consider Hadoop/ZooKeeper as pre-requisites. Other 
products (e.g. Ambari) likely will do this just fine for what I care.


I'll take a look at the AccStack project too. Thanks for linking it.

Dylan Hutchison wrote:

It might be useful to look at the Accumulo quickinstall maven project.

https://github.com/accumulobook/quickinstall


Michael Wall wrote the project and I helped test and patch it.  It runs
from a release but could be modified to build from source instead.

There's also a more messy script I used to build Hadoop and Accumulo from
source, including the Hadoop native stuff.

https://github.com/Stevens-GraphGroup/AccStack/blob/master/setup.bash


Cheers, Dylan

On Mon, Jan 25, 2016 at 9:38 PM, Josh Elser<josh.el...@gmail.com>  wrote:


FYI, my evening hack plans this week are going to be centered around
creating some automation around building Accumulo, installing it on some
nodes and then automatically running Continuous Ingest/Randomwalk.

Looking back now, I can't hardly believe I'e gone this long without
writing something.

If anyone else is thinking about this for the upcoming 1.6.5 and 1.7.1
testing, I'd be happy to work together. The approach should be agnostic on
platform and bootstraps Accumulo by hand (SSH, bash, and python) with
arguments (selfishly defaulting to things useful to me). I presently have
it pulling down some Git ref, building and installing the tarball. Need to
get configuration, copying the install/configs to many nodes, running CI
and reporting some results at the end.

https://github.com/joshelser/accumulo-automated-testing

- Josh





Automating deploy+testing

2016-01-25 Thread Josh Elser
FYI, my evening hack plans this week are going to be centered around 
creating some automation around building Accumulo, installing it on some 
nodes and then automatically running Continuous Ingest/Randomwalk.


Looking back now, I can't hardly believe I'e gone this long without 
writing something.


If anyone else is thinking about this for the upcoming 1.6.5 and 1.7.1 
testing, I'd be happy to work together. The approach should be agnostic 
on platform and bootstraps Accumulo by hand (SSH, bash, and python) with 
arguments (selfishly defaulting to things useful to me). I presently 
have it pulling down some Git ref, building and installing the tarball. 
Need to get configuration, copying the install/configs to many nodes, 
running CI and reporting some results at the end.


https://github.com/joshelser/accumulo-automated-testing

- Josh


Re: Interesting bug report

2016-01-25 Thread Josh Elser
I've long be waffling about the usefulness of our "infinite retry" 
logic. It's great for daemons. It sucks for humans.


Maybe there's a story in addressing this via ClientConfiguration -- let 
the user tell us the policy they want to follow.


John Vines wrote:

Of course, it's when I hit send that I realize that we could mitigate by
making the client aware of the master state, and if the system is shut down
(which was the case for that ticket), then it can fail quickly with a
descriptive message.

On Mon, Jan 25, 2016 at 10:58 AM John Vines  wrote:


While we want to be fault tolerant, there's a point where we want to
eventually fail. I know we have a couple never ending retry loops that need
to be addressed (https://issues.apache.org/jira/browse/ACCUMULO-1268),
but I'm unsure if queries suffer from this problem.

Unfortunately, fault tolerance is a bit at odds with instant notification
of system issues, since some of the fault tolerance is temporally oriented.
And that ticket lacks context of it never failing out vs. failing out
eventually (but too long for the user)


On Sun, Jan 24, 2016 at 7:46 PM Christopher  wrote:


I saw this bug report:
https://bugzilla.redhat.com/show_bug.cgi?id=1300987

As far as I can tell, they are reporting normal, expected, and desired
behavior of Accumulo as a bug. But, is there something we can do upstream
to enable fast failures in the case of Accumulo not running to support
their use case?

Personally, I don't see how we can reliably detect within the client that
the cluster is down or up, vs. a normal temporary server outage/migration,
since there is there is no single point of authority for Accumulo to
determine its overall operating status if ZooKeeper is running and no
other
servers are. Am I wrong?





Re: question on data block cache

2016-01-22 Thread Josh Elser

z11373 wrote:

Thanks Josh!
Ok, here I add column Hosted Tablets and Entries to the table below for
additional information.
As we can see the tablets are distributed evenly to all tablet servers, and
the one with highest load has the highest number of entries (>  1B), there
are few tablet servers have>  700M entries, which are not really far away.
I'd admit the data distribution likely not great, because URL is used as row
id value (so many of them share same prefix), and it's almost impossible to
set the presplit points, unless we know what the data value would be.
Instead of specifying split point strings, I wish Accumulo has feature to
allow us to specify x number of tablets, and it will automatically split y
entries across those x tablets :-)


Remember that tables are not static entities. As you add more data, you 
want them to keep splitting, that's why splitting is intended to run off 
of size of tablets. The manual split generation by row is meant to help 
you "seed" a table.



Follow up questions:
1. The test queries are generated randomly, so theoretically I'd say the
likelihood most requests coming to 1 tablet server should be slim, but with
the fact of URL is used as row id value, then that may be possible. What
does the number in Query column indicate? Is that the number of entries
returned, or number of reads?


You should know what your distribution of data looks like and what you 
queries look like. It sounds like you created a big hotspot in your 
table. The Query column is number of entries per second, IIRC. There's a 
collapsed legend on the data shown on the monitor which you can check.



2. Looking at sample table below, is there a way to find out the ranges of
all tablets hosted on TServer14? I am thinking to write a small program to
scan all row ids from that tablet server, and find the values which would
become the split points, which then I can add the splits to the table, and
re-run my tests to see if it resolves the issue.


This information can be extracted from the accumulo.metadata table, but 
I'm not sure if we have a utility for it I can directly refer you to. 
But yes, find the hotspot of data on that server, and add splits.


You can also try to use the listscans shell command to see the open 
scans going to particular tservers.



Regarding your other question, yes, I saw a few occasion when refreshing the
page, which it shows number of active scans was not 16, and yet there were
waiting scans, so it's not like 1-2 times.

Server | Hosted Tablets | Entries | Query | Running Scans

TServer1 | 47 | 548.43M | 24 | 0 (0)
TServer2 | 47 | 708.70M | 37 | 0 (0)
TServer3 | 47 | 597.88M | 40 | 0 (0)
TServer4 | 47 | 382.72M | 1 | 0 (0)
TServer5 | 47 | 756.77M | 0 | 0 (0)
TServer6 | 47 | 654.38M | 57 | 0 (0)
TServer7 | 47 | 695.09M | 5 | 0 (0)
TServer8 | 47 | 637.94M | 4 | 0 (0)
TServer9 | 47 | 541.74M | 7 | 0 (0)
TServer10 | 46 | 625.12M | 0 | 0 (0)
TServer11 | 46 | 248.75M | 56 | 0 (0)
TServer12 | 46 | 368.87M | 124 | 0 (0)
TServer13 | 46 | 292.73M | 25 | 0 (0)
TServer14 | 46 | 1.05B | 121 | 16 (435)
TServer15 | 46 | 442.23M | 36 | 0 (0)
TServer16 | 46 | 800.67M | 21 | 0 (0)
TServer17 | 46 | 689.81M | 3 | 0 (0)
TServer18 | 46 | 351.86M | 107 | 0 (0)
TServer19 | 47 | 941.17M | 21 | 0 (0)
TServer20 | 47 | 257.99M | 92 | 0 (0)


Thanks,
Z




--
View this message in context: 
http://apache-accumulo.1065345.n5.nabble.com/question-on-data-block-cache-tp15906p15937.html
Sent from the Developers mailing list archive at Nabble.com.


Re: question on data block cache

2016-01-22 Thread Josh Elser
A max of 16 but only 8 running is peculiar. It could be a scheduling 
thing, but I'm not entirely sure. Do you see this case regularly? (not 
the one you replied with later and I responded about already)


z11373 wrote:

Thanks Josh!
Another question came to my mind while observing the Monitor UI :-)

I noticed on the Running Scans column, it occasionally showing '16(5)',
which I interpret it as there are 21 scans, 16 are running and 5 are
waiting. What I don't understand is I also saw something like '8(3)', which
seems weird to me, because default max is 16, so how come there's only 8
running and 3 have to wait, why not '11(0)'?

Btw, that number represents each Scanner object created by the client,
right?

Thanks,
Z



--
View this message in context: 
http://apache-accumulo.1065345.n5.nabble.com/question-on-data-block-cache-tp15906p15932.html
Sent from the Developers mailing list archive at Nabble.com.


Re: question on data block cache

2016-01-22 Thread Josh Elser
A Tablet is hosted by one and only one TabletServer at any time. The 
only data you are querying is in Tablet(s) hosted by that one TabletServer.


Add more splits to the table which you are heavily querying to spread 
the load across many TabletServers. Load is parallelized in Accumulo by 
creating multiple Tablets (via splits) which can be spread across 
multiple servers.


z11373 wrote:

Another observation which I really want to know why.

I see 1 tablet server consistently shows high number of waiting scans, i.e.
'16 (435)' or '14 (476)' on Running Scans column, while the other 19 tablet
servers all showing '0 (0)' or sometimes showing single digit number. The
thing I don't understand is what I've seen from the Monitor UI for a table
at one snapshot for example, and this is more or less consistent from time
to time (when the test clients are still running).

Server | Query | Running Scans

TServer1 | 24 | 0 (0)
TServer2 | 37 | 0 (0)
TServer3 | 40 | 0 (0)
TServer4 | 1 | 0 (0)
TServer5 | 0 | 0 (0)
TServer6 | 57 | 0 (0)
TServer7 | 5 | 0 (0)
TServer8 | 4 | 0 (0)
TServer9 | 7 | 0 (0)
TServer10 | 0 | 0 (0)
TServer11 | 56 | 0 (0)
TServer12 | 124 | 0 (0)
TServer13 | 25 | 0 (0)
TServer14 | 121 | 16 (435)
TServer15 | 36 | 0 (0)
TServer16 | 21 | 0 (0)
TServer17 | 3 | 0 (0)
TServer18 | 107 | 0 (0)
TServer19 | 21 | 0 (0)
TServer20 | 92 | 0 (0)

I'd really appreciate if someone could explain what may have happened, or
can point me to the right direction for starting the investigation. For the
record, there are 50 test clients, each may create multiple scanners to
query different data from that table.

Thanks,
Z



--
View this message in context: 
http://apache-accumulo.1065345.n5.nabble.com/question-on-data-block-cache-tp15906p15934.html
Sent from the Developers mailing list archive at Nabble.com.


Re: question on data block cache

2016-01-22 Thread Josh Elser

Patches welcome for trace-level block cache logging improvements :)

We don't have a lot of granular insight into the block cache. This would 
be nice.


z11373 wrote:

Hi Eric,
I actually care less about that, at least for now.
Is there something simpler, like Accumulo telling the LRU data in the cache
being pushed out?
I don't need the exact detail, just want to know that indeed happened (and
perhaps pretty frequent, hence won't hit 99% data cache hit ratio)

Thanks,
Z



--
View this message in context: 
http://apache-accumulo.1065345.n5.nabble.com/question-on-data-block-cache-tp15906p15929.html
Sent from the Developers mailing list archive at Nabble.com.


Re: Fixing bugs

2016-01-20 Thread Josh Elser

Hi Naveen,

Thanks for your interest.

Find an issue on JIRA[1] which is unassigned and unresolved, and leave a 
comment that you'd like to work on it. One of the committer can then 
assign the issue to you.


You can either submit a patch[2] or a PullRequest on Github[3]. We're 
happy for any contribution you take the time to submit.


- Josh

[1] 
https://issues.apache.org/jira/issues/?jql=project%20%3D%20ACCUMULO%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC

[2] http://accumulo.apache.org/git.html#contributors
[3] https://github.com/apache/accumulo

Naveen Shukla wrote:

I want to contribute to accumulo project. I want to know to do get assigned
a bug and How to submit please. Please tell me the process.
Thanks



Re: Setting Up Environment

2016-01-20 Thread Josh Elser

Hi Naveen,

Can you be more specific on what exactly you mean by environment? Are 
you talking about how you write code or running Accumulo in general?


The former, check the website[1]. The latter, there are numerous ways to 
install Hadoop and ZooKeeper. Many major "vendors" provide virtual 
machines[2][3] with Hadoop already running which would likely be the 
easiest place to begin.


[1] http://accumulo.apache.org/source.html#ide-configuration-tips
[2] http://hortonworks.com/products/hortonworks-sandbox/
[3] 
http://www.cloudera.com/content/www/en-us/documentation/enterprise/latest/topics/cloudera_quickstart_vm.html


Naveen Shukla wrote:

How to set up my environment to start contributing to this project



Re: Request to enable new PreCommit job

2016-01-11 Thread Josh Elser

Thanks, Jake!

Jake Farrell wrote:

ACCUMULO has now been added to the pre commit filter, the jenkins job will
start getting kicked off when the patch-available flag is set on your
tickets

-Jake

On Sun, Jan 10, 2016 at 1:47 PM, Josh Elser<josh.el...@gmail.com>  wrote:


Hiya --

I'm hoping that someone can make the necessary alterations to the
PreCommit-Admin job to enable PreCommit-ACCUMULO-Build [1].

I had copied this from the HBase job, and inherited using the "Hadoop"
label to identify build slaves. LMK if I should set this to something else.

Thanks!

- Josh

[1]
https://builds.apache.org/view/PreCommit%20Builds/job/PreCommit-ACCUMULO-Build/





list unsubscribing (was Re: [DISCUSS] Enable PreCommit build)

2016-01-08 Thread Josh Elser
You unsubscribe yourself from lists just as you subscribed yourself in 
the first place: http://accumulo.apache.org/mailing_list.html


Chris Rigano wrote:

Please drop me from this list.


Re: [DISCUSS] Enable PreCommit build

2016-01-08 Thread Josh Elser

Thanks all for the quick responses!

I'm not sure what all integration there is with GH directly. I know 
Yetus is smart enough to find PR's when they end in .patch when 
commented on a JIRA issue (e.g. 
https://github.com/apache/accumulo/pull/63.patch).


I would guess that attachments are only looked at if they end in .diff 
or .patch. I'm not sure how the "Patch Available" status affects things 
(I know it was important for the old HadoopQA jobs which eventually 
evolved into Yetus).


Somehow, I already had the appropriate karma to read/modify Jenkins. I 
job I made was cloned from the HBase one. I would guess this is an infra 
ask? I am not entirely sure how I got the karma in the first place. I am 
also not sure if there are other levels which protect our job from 
others changing the configuration (as I have the ability to go and 
modify any), but I'm not super concerned about this.


Christopher wrote:

On by default: +1
Separate JIRA account: makes sense
Concerns about -1/+1 in CTR: it's informs the committers, but isn't
binding, so I'm not concerned

Questions:
Is this just for PRs or patches, too? If patches also, how do we identify
patches vs. other JIRA attachments?
How do other committers/PMC tweak/modify the settings? Are permissions
similar to Jenkins at builds.apache.org?

On Fri, Jan 8, 2016 at 1:41 PM John Vines<vi...@apache.org>  wrote:


+1

On Fri, Jan 8, 2016 at 12:58 PM Keith Turner<ke...@deenlo.com>  wrote:


+1

On Fri, Jan 8, 2016 at 12:24 PM, Josh Elser<josh.el...@gmail.com>

wrote:

Hi,

Per the other thread "Yetus Accumulo 'Personality'" [1], I'd like to

see

what people think about turning this on by default.

I've been talking to Sean in chat today who had made a suggestion that

we

get our own JIRA acct instead of the "Hadoop QA" user. Aside from that,

I'm

pretty happy with this.

There is likely further tweaking we can do (e.g. multijdk builds, try

the

sunny-day ITs). One big concern is the presence of a -1/+1 in an CTR
community. We would need some docs to be clear that the PreCommit

comment

is a tool for vetting contributions, not a bar that must be satisfied

prior

to commit (this is a simple website update).

Anywho -- if you have opinions, please let them be heard now. If there
isn't any argument against, I'll move ahead with this in time.


[1]


http://mail-archives.apache.org/mod_mbox/accumulo-dev/201601.mbox/%3c568b5bfc.2080...@gmail.com%3E




Re: offload request to another tserver

2016-01-08 Thread Josh Elser

z11373 wrote:

I recall that I ever read somewhere that Accumulo is able to move tablets to
other tservers if it experiences high load (i.e. read request) on one
particular tserver, is that correct?


No, we don't rebalance tablet assignments on load. I believe the default 
balancer implementation tries to spread tablets for a specific table 
evenly across the available nodes. This is a pluggable implementation.



 From the monitor page, one of the tservers that got heavy request would have
at most 16 active scans, and often 3 digit number inside the parenthesis
(i.e. # of scans waiting). I want to experiment if changing the max active
scans would help the test run faster. Is it possible to change that max
active scans number from 16 to a higher number?


Try increasing the following properties (I provided their defaults for you):

tserver.readahead.concurrent.max=16
tserver.scan.files.open.max=100

I remember writing up some really thorough documentation on how to use 
the configuration properties we have to really saturate your nodes, but 
I don't remember where that was. I'll try to find it :)


You can double check the Accumulo user manual on the website to see if 
we have some information there (that should be where we have the info).


[DISCUSS] Enable PreCommit build

2016-01-08 Thread Josh Elser

Hi,

Per the other thread "Yetus Accumulo 'Personality'" [1], I'd like to see 
what people think about turning this on by default.


I've been talking to Sean in chat today who had made a suggestion that 
we get our own JIRA acct instead of the "Hadoop QA" user. Aside from 
that, I'm pretty happy with this.


There is likely further tweaking we can do (e.g. multijdk builds, try 
the sunny-day ITs). One big concern is the presence of a -1/+1 in an CTR 
community. We would need some docs to be clear that the PreCommit 
comment is a tool for vetting contributions, not a bar that must be 
satisfied prior to commit (this is a simple website update).


Anywho -- if you have opinions, please let them be heard now. If there 
isn't any argument against, I'll move ahead with this in time.



[1] 
http://mail-archives.apache.org/mod_mbox/accumulo-dev/201601.mbox/%3c568b5bfc.2080...@gmail.com%3E


Re: [DISCUSS] Trivial changes and git

2016-01-07 Thread Josh Elser

I think I disagree that they are a lot of work or a big distraction.

The amount of work behind a trivial change (in terms of the tool: git) 
is no different (you commit to all active branches and maybe fix merge 
conflicts).


Personally, I find little nit-picky things good to just piggy-back on 
the original JIRA they were introduced in. For "old" issues (where the 
original change are long dead or the changes were in the initial 
import), I'd rather see a #2 as below. The reasoning is that if I have 
merge conflicts, I can at least see that it was only formatting changes 
(and not some functionality change).


If there are things you believe are worth fixing, they are by definition 
not a distraction.


Overall, I think it's a non-issue, but, when encountering this: 
"reasonable" amounts of batching of trivial changes would be nice (#2).


Christopher wrote:

Accumulo Devs,

We typically create a JIRA for every change, and then explicitly reference
that JIRA in the git commit log. Sometimes, this seems like a lot of work
(or, at the very least, a big distraction) for *really* trivial changes[1].

My question(s):

What are the pros and cons of being strict about this for trivial issues?
What value does creating a JIRA actually add for such things? Is the
creation of a JIRA issue worth the distraction and time in ALL cases, or
should developer discretion apply? How strict to we want to be about JIRA
references?

* * *

For additional consideration, I've noticed that trivial fixes tend to get
addressed in the following ways:

1. "Drive-by" - rolled into another, unrelated, commit (will get
reviewed/reverted/merged along with a non-trivial issue, simply due to its
vicinity in space or time)
2. "One-JIRA-to-rule-them-all" - a JIRA without much of a description,
created "just so we have a ticket to reference" for several (perhaps
unrelated) trivial fixes
3. "One-JIRA-each" - each trivial issue gets its own JIRA issue, its own
commit, and its own description (many of each are nearly identical)

In each case, it seems like it would have been sufficient to simply
describe the trivial change in a separate git commit which is included in
the next push.

* * *

[1]: By "*really* trivial changes", I mean small typos,
spelling/grammar/punctuation/capitalization issues in docs, formatting,
String literal alignment/wrapping issues, perhaps even missing @Overrides
annotations, extra semicolons, unneeded warnings suppressions, etc.
Essentially, things that are typically one-off changes that don't change
the behavior or substance of the code or documentation, or that are
self-contained, easily-understood, can be reasonably expected to be
non-controversial, and which couldn't be further elaborated upon with a
description in JIRA. Such changes would not include trivial bug fixes or
feature enhancements, and are more likely to be described as style or typo
fixes.



Re: Yetus Accumulo "Personality"

2016-01-07 Thread Josh Elser
Welp: 
https://issues.apache.org/jira/browse/ACCUMULO-4095?focusedCommentId=15088814=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15088814


Josh Elser wrote:

That's a good question (about a point which I haven't outlined as a
possibility).

I had been poking around some of the PreCommit jobs on ASF Jenkins which
were Yetus-ified and noticed that some of them already had some
parameterization to support 1) using a custom Yetus install or 2) using
some different personality not in Yetus.

Obviously, I would like to make sure that we keep Yetus up to date with
any changes/improvements we make for our own purposes here in Accumulo,
but don't see it as a requirement that we have to stall ourselves waiting.

Although, I have mostly been talking to myself in this thread, so
perhaps the iteration speed is my nights+weekends availability :)

Sean Busbey wrote:

Excellent, thanks for working on this Josh!

Is it worth us running our personality out of our code base rather than
Yetus'? I'm not sure if we need the faster iteration speed or not.

-Sean

On Tue, Jan 5, 2016 at 12:00 AM, Josh Elser<josh.el...@gmail.com> wrote:


FYI https://issues.apache.org/jira/browse/YETUS-263 was merged in last
week.

Eric had also sent me a reply off-list which asked if it would be
possible
to do a `mvn verify -Psunny` to run the small set of ITs we have
defined in
the pom (Examples, ReadWrite, and ShellServer ITs, IIRC).

Ignoring the issue of whether or not we could even run those tests on
ASF
infra, I would guess that we could write a plugin that runs Maven
integration tests (like they run unit tests) and approach the
functionality
that way. We could expose some control which determine whether or not
these
integration tests were invoked.

I also commented with some output on Matt's patch from ACCUMULO-2493
-- I
found it rather pleasant to run a single command and get a nice
summary of
his changes.


Josh Elser wrote:


For those interested in following along with the PreCommit work, see
https://issues.apache.org/jira/browse/YETUS-263

A "personality", in Yetus parlance, defines the the tests/checks that
PreCommit will run against Accumulo. For us, it's pretty simple. The
personality I provided on YETUS-263 will, for a patch/changeset run:

* Checkstyle
* Findbugs
* RAT check
* @author javadoc check
* Some extra whitespace
* All unit tests (not just in the module where changes were made)
* Compiler warnings
* Javadoc warnings
* Presence of new unit tests

One already built in feature that I didn't wire up is ShellCheck for
our
shell scripts. This will require a bit of fixing on our end first.

For more general information, Chris Nauroth wrote up a good explanation
for adopting the same approach in ZooKeeper (and did a much better job
than me

http://mail-archives.apache.org/mod_mbox/zookeeper-dev/201512.mbox/%3CD291EB3B.3504A%25cnauroth%40hortonworks.com%3E

).


Anywho, nothing really changing here yet (I'm hoping Sean will write up
instructions about how to configure the Jenkins job for us
https://issues.apache.org/jira/browse/YETUS-245). That would signify a
step for Accumulo specifically. Until then, this is just an FYI.

- Josh






Re: Yetus Accumulo "Personality"

2016-01-05 Thread Josh Elser
That's a good question (about a point which I haven't outlined as a 
possibility).


I had been poking around some of the PreCommit jobs on ASF Jenkins which 
were Yetus-ified and noticed that some of them already had some 
parameterization to support 1) using a custom Yetus install or 2) using 
some different personality not in Yetus.


Obviously, I would like to make sure that we keep Yetus up to date with 
any changes/improvements we make for our own purposes here in Accumulo, 
but don't see it as a requirement that we have to stall ourselves waiting.


Although, I have mostly been talking to myself in this thread, so 
perhaps the iteration speed is my nights+weekends availability :)


Sean Busbey wrote:

Excellent, thanks for working on this Josh!

Is it worth us running our personality out of our code base rather than
Yetus'? I'm not sure if we need the faster iteration speed or not.

-Sean

On Tue, Jan 5, 2016 at 12:00 AM, Josh Elser<josh.el...@gmail.com>  wrote:


FYI https://issues.apache.org/jira/browse/YETUS-263 was merged in last
week.

Eric had also sent me a reply off-list which asked if it would be possible
to do a `mvn verify -Psunny` to run the small set of ITs we have defined in
the pom (Examples, ReadWrite, and ShellServer ITs, IIRC).

Ignoring the issue of whether or not we could even run those tests on ASF
infra, I would guess that we could write a plugin that runs Maven
integration tests (like they run unit tests) and approach the functionality
that way. We could expose some control which determine whether or not these
integration tests were invoked.

I also commented with some output on Matt's patch from ACCUMULO-2493 -- I
found it rather pleasant to run a single command and get a nice summary of
his changes.


Josh Elser wrote:


For those interested in following along with the PreCommit work, see
https://issues.apache.org/jira/browse/YETUS-263

A "personality", in Yetus parlance, defines the the tests/checks that
PreCommit will run against Accumulo. For us, it's pretty simple. The
personality I provided on YETUS-263 will, for a patch/changeset run:

* Checkstyle
* Findbugs
* RAT check
* @author javadoc check
* Some extra whitespace
* All unit tests (not just in the module where changes were made)
* Compiler warnings
* Javadoc warnings
* Presence of new unit tests

One already built in feature that I didn't wire up is ShellCheck for our
shell scripts. This will require a bit of fixing on our end first.

For more general information, Chris Nauroth wrote up a good explanation
for adopting the same approach in ZooKeeper (and did a much better job
than me

http://mail-archives.apache.org/mod_mbox/zookeeper-dev/201512.mbox/%3CD291EB3B.3504A%25cnauroth%40hortonworks.com%3E
).


Anywho, nothing really changing here yet (I'm hoping Sean will write up
instructions about how to configure the Jenkins job for us
https://issues.apache.org/jira/browse/YETUS-245). That would signify a
step for Accumulo specifically. Until then, this is just an FYI.

- Josh






Re: Accumulo-1.6 - Build # 931 - Still Failing

2016-01-05 Thread Josh Elser

Thank you!

Christopher wrote:

Fixed in 288a2e108e1a3a3f0cd3bb083c77c56bec97bf08

On Fri, Jan 1, 2016 at 9:36 PM Josh Elser<josh.el...@gmail.com>  wrote:


Doesn't seem like pulling back the plugin version worked. Will have to look
into this more later.
On Dec 31, 2015 1:24 AM, "Josh Elser"<josh.el...@gmail.com>  wrote:




https://git1-us-west.apache.org/repos/asf?p=accumulo.git;a=commit;h=401350f08807a7544d31e42ce6cc297ab45df8f4

Josh Elser wrote:


Ah! This came in via ACCUMULO-4089 that Christopher did.

I will make the change.

Josh Elser wrote:


Oh, is because we're actually using JDK6, whereas most of us are just
using JDK7 or 8 but a target of 6? That would jive with what I'm

seeing,

I think.

Christopher wrote:


Dropping it to 2.13 in the 1.6 branch should fix it if that's the

case.

On Tue, Dec 29, 2015, 10:57 Keith Turner<ke...@deenlo.com>  wrote:

On Mon, Dec 28, 2015 at 3:39 PM, Josh Elser<josh.el...@gmail.com>

wrote:

Was looking at this failure and saw:

[ERROR] Failed to execute goal
org.apache.maven.plugins:maven-checkstyle-plugin:2.15:check
(check-style)
on project accumulo-project: Execution check-style of goal
org.apache.maven.plugins:maven-checkstyle-plugin:2.15:check failed:

An

API


incompatibility was encountered while executing
org.apache.maven.plugins:maven-checkstyle-plugin:2.15:check:
java.lang.UnsupportedClassVersionError:
com/puppycrawl/tools/checkstyle/api/CheckstyleException :

Unsupported

major.minor version 51.0

This is 1.6, so we're still using JDK1.6. So, that means that
maven-checkstyle-plugin 2.15 was built with JDK1.7 and is
incompatible? I
wonder why we're only seeing this on the build server...


Maybe the build server is actually using JDK 1.6 to build. I suspect
most
developers are using JDK 1.7 or 1.8 to build.




 Original Message 
Subject: Accumulo-1.6 - Build # 931 - Still Failing
Date: Mon, 28 Dec 2015 20:11:54 + (UTC)
From: Apache Jenkins Server<jenk...@builds.apache.org>
Reply-To: j...@apache.org
To: notificati...@accumulo.apache.org

The Apache Jenkins build system has built Accumulo-1.6 (build #931)

Status: Still Failing

Check console output at
https://builds.apache.org/job/Accumulo-1.6/931/
to view the results.






Re: Yetus Accumulo "Personality"

2016-01-04 Thread Josh Elser

FYI https://issues.apache.org/jira/browse/YETUS-263 was merged in last week.

Eric had also sent me a reply off-list which asked if it would be 
possible to do a `mvn verify -Psunny` to run the small set of ITs we 
have defined in the pom (Examples, ReadWrite, and ShellServer ITs, IIRC).


Ignoring the issue of whether or not we could even run those tests on 
ASF infra, I would guess that we could write a plugin that runs Maven 
integration tests (like they run unit tests) and approach the 
functionality that way. We could expose some control which determine 
whether or not these integration tests were invoked.


I also commented with some output on Matt's patch from ACCUMULO-2493 -- 
I found it rather pleasant to run a single command and get a nice 
summary of his changes.


Josh Elser wrote:

For those interested in following along with the PreCommit work, see
https://issues.apache.org/jira/browse/YETUS-263

A "personality", in Yetus parlance, defines the the tests/checks that
PreCommit will run against Accumulo. For us, it's pretty simple. The
personality I provided on YETUS-263 will, for a patch/changeset run:

* Checkstyle
* Findbugs
* RAT check
* @author javadoc check
* Some extra whitespace
* All unit tests (not just in the module where changes were made)
* Compiler warnings
* Javadoc warnings
* Presence of new unit tests

One already built in feature that I didn't wire up is ShellCheck for our
shell scripts. This will require a bit of fixing on our end first.

For more general information, Chris Nauroth wrote up a good explanation
for adopting the same approach in ZooKeeper (and did a much better job
than me
http://mail-archives.apache.org/mod_mbox/zookeeper-dev/201512.mbox/%3CD291EB3B.3504A%25cnauroth%40hortonworks.com%3E).


Anywho, nothing really changing here yet (I'm hoping Sean will write up
instructions about how to configure the Jenkins job for us
https://issues.apache.org/jira/browse/YETUS-245). That would signify a
step for Accumulo specifically. Until then, this is just an FYI.

- Josh


Re: Off-Heap Caches

2016-01-03 Thread Josh Elser
It should be pretty pluggable, actually. Our current on heap implementation
is actually taken from HBase. I think that their bucket cache library is
currently the recommended means for offheap.

I would be excited to see this done and would be happy to try to help out
where possible.
On Jan 3, 2016 4:29 PM, "Michael Moss"  wrote:

> Hello.
>
> I did a quick search of the user/dev lists and JIRA for some
> history/context, though am still curious if there has been any
> consideration to adding an option for off-heap index and/or block caches.
>
> Any comments on how easy/difficult it might be for a project newbie to PR
> this feature?
>
> Thanks!
>
> -Mike
>


Re: Accumulo-1.6 - Build # 931 - Still Failing

2016-01-01 Thread Josh Elser
Doesn't seem like pulling back the plugin version worked. Will have to look
into this more later.
On Dec 31, 2015 1:24 AM, "Josh Elser" <josh.el...@gmail.com> wrote:

>
> https://git1-us-west.apache.org/repos/asf?p=accumulo.git;a=commit;h=401350f08807a7544d31e42ce6cc297ab45df8f4
>
> Josh Elser wrote:
>
>> Ah! This came in via ACCUMULO-4089 that Christopher did.
>>
>> I will make the change.
>>
>> Josh Elser wrote:
>>
>>> Oh, is because we're actually using JDK6, whereas most of us are just
>>> using JDK7 or 8 but a target of 6? That would jive with what I'm seeing,
>>> I think.
>>>
>>> Christopher wrote:
>>>
>>>> Dropping it to 2.13 in the 1.6 branch should fix it if that's the case.
>>>>
>>>> On Tue, Dec 29, 2015, 10:57 Keith Turner<ke...@deenlo.com> wrote:
>>>>
>>>> On Mon, Dec 28, 2015 at 3:39 PM, Josh Elser<josh.el...@gmail.com>
>>>>> wrote:
>>>>>
>>>>> Was looking at this failure and saw:
>>>>>>
>>>>>> [ERROR] Failed to execute goal
>>>>>> org.apache.maven.plugins:maven-checkstyle-plugin:2.15:check
>>>>>> (check-style)
>>>>>> on project accumulo-project: Execution check-style of goal
>>>>>> org.apache.maven.plugins:maven-checkstyle-plugin:2.15:check failed: An
>>>>>>
>>>>> API
>>>>>
>>>>>> incompatibility was encountered while executing
>>>>>> org.apache.maven.plugins:maven-checkstyle-plugin:2.15:check:
>>>>>> java.lang.UnsupportedClassVersionError:
>>>>>> com/puppycrawl/tools/checkstyle/api/CheckstyleException : Unsupported
>>>>>> major.minor version 51.0
>>>>>>
>>>>>> This is 1.6, so we're still using JDK1.6. So, that means that
>>>>>> maven-checkstyle-plugin 2.15 was built with JDK1.7 and is
>>>>>> incompatible? I
>>>>>> wonder why we're only seeing this on the build server...
>>>>>>
>>>>>
>>>>> Maybe the build server is actually using JDK 1.6 to build. I suspect
>>>>> most
>>>>> developers are using JDK 1.7 or 1.8 to build.
>>>>>
>>>>>
>>>>>
>>>>>>  Original Message 
>>>>>> Subject: Accumulo-1.6 - Build # 931 - Still Failing
>>>>>> Date: Mon, 28 Dec 2015 20:11:54 + (UTC)
>>>>>> From: Apache Jenkins Server<jenk...@builds.apache.org>
>>>>>> Reply-To: j...@apache.org
>>>>>> To: notificati...@accumulo.apache.org
>>>>>>
>>>>>> The Apache Jenkins build system has built Accumulo-1.6 (build #931)
>>>>>>
>>>>>> Status: Still Failing
>>>>>>
>>>>>> Check console output at
>>>>>> https://builds.apache.org/job/Accumulo-1.6/931/
>>>>>> to view the results.
>>>>>>
>>>>>>
>>>>


Re: Accumulo-1.6 - Build # 931 - Still Failing

2015-12-30 Thread Josh Elser

Ah! This came in via ACCUMULO-4089 that Christopher did.

I will make the change.

Josh Elser wrote:

Oh, is because we're actually using JDK6, whereas most of us are just
using JDK7 or 8 but a target of 6? That would jive with what I'm seeing,
I think.

Christopher wrote:

Dropping it to 2.13 in the 1.6 branch should fix it if that's the case.

On Tue, Dec 29, 2015, 10:57 Keith Turner<ke...@deenlo.com> wrote:


On Mon, Dec 28, 2015 at 3:39 PM, Josh Elser<josh.el...@gmail.com> wrote:


Was looking at this failure and saw:

[ERROR] Failed to execute goal
org.apache.maven.plugins:maven-checkstyle-plugin:2.15:check
(check-style)
on project accumulo-project: Execution check-style of goal
org.apache.maven.plugins:maven-checkstyle-plugin:2.15:check failed: An

API

incompatibility was encountered while executing
org.apache.maven.plugins:maven-checkstyle-plugin:2.15:check:
java.lang.UnsupportedClassVersionError:
com/puppycrawl/tools/checkstyle/api/CheckstyleException : Unsupported
major.minor version 51.0

This is 1.6, so we're still using JDK1.6. So, that means that
maven-checkstyle-plugin 2.15 was built with JDK1.7 and is
incompatible? I
wonder why we're only seeing this on the build server...


Maybe the build server is actually using JDK 1.6 to build. I suspect
most
developers are using JDK 1.7 or 1.8 to build.




 Original Message 
Subject: Accumulo-1.6 - Build # 931 - Still Failing
Date: Mon, 28 Dec 2015 20:11:54 + (UTC)
From: Apache Jenkins Server<jenk...@builds.apache.org>
Reply-To: j...@apache.org
To: notificati...@accumulo.apache.org

The Apache Jenkins build system has built Accumulo-1.6 (build #931)

Status: Still Failing

Check console output at https://builds.apache.org/job/Accumulo-1.6/931/
to view the results.





Yetus Accumulo "Personality"

2015-12-30 Thread Josh Elser
For those interested in following along with the PreCommit work, see 
https://issues.apache.org/jira/browse/YETUS-263


A "personality", in Yetus parlance, defines the the tests/checks that 
PreCommit will run against Accumulo. For us, it's pretty simple. The 
personality I provided on YETUS-263 will, for a patch/changeset run:


* Checkstyle
* Findbugs
* RAT check
* @author javadoc check
* Some extra whitespace
* All unit tests (not just in the module where changes were made)
* Compiler warnings
* Javadoc warnings
* Presence of new unit tests

One already built in feature that I didn't wire up is ShellCheck for our 
shell scripts. This will require a bit of fixing on our end first.


For more general information, Chris Nauroth wrote up a good explanation 
for adopting the same approach in ZooKeeper (and did a much better job 
than me 
http://mail-archives.apache.org/mod_mbox/zookeeper-dev/201512.mbox/%3CD291EB3B.3504A%25cnauroth%40hortonworks.com%3E).


Anywho, nothing really changing here yet (I'm hoping Sean will write up 
instructions about how to configure the Jenkins job for us 
https://issues.apache.org/jira/browse/YETUS-245). That would signify a 
step for Accumulo specifically. Until then, this is just an FYI.


- Josh


1.6.5 and 1.7.1

2015-12-30 Thread Josh Elser
Open Issues: 
https://issues.apache.org/jira/issues/?jql=project%20%3D%20ACCUMULO%20AND%20resolution%20%3D%20Unresolved%20AND%20fixVersion%20in%20%281.6.5%2C%201.7.1%29%20ORDER%20BY%20assignee%20ASC


I cleaned up a number of issues tonight. There are a few open (I've 
pinged on the ones I am uncertain about already). Please take the time 
to leave a note or just update the fixVersion so we can start thinking 
about making some RCs and running CI/RW.


Thanks!


Re: Accumulo-1.6 - Build # 931 - Still Failing

2015-12-29 Thread Josh Elser
Oh, is because we're actually using JDK6, whereas most of us are just 
using JDK7 or 8 but a target of 6? That would jive with what I'm seeing, 
I think.


Christopher wrote:

Dropping it to 2.13 in the 1.6 branch should fix it if that's the case.

On Tue, Dec 29, 2015, 10:57 Keith Turner<ke...@deenlo.com>  wrote:


On Mon, Dec 28, 2015 at 3:39 PM, Josh Elser<josh.el...@gmail.com>  wrote:


Was looking at this failure and saw:

[ERROR] Failed to execute goal
org.apache.maven.plugins:maven-checkstyle-plugin:2.15:check (check-style)
on project accumulo-project: Execution check-style of goal
org.apache.maven.plugins:maven-checkstyle-plugin:2.15:check failed: An

API

incompatibility was encountered while executing
org.apache.maven.plugins:maven-checkstyle-plugin:2.15:check:
java.lang.UnsupportedClassVersionError:
com/puppycrawl/tools/checkstyle/api/CheckstyleException : Unsupported
major.minor version 51.0

This is 1.6, so we're still using JDK1.6. So, that means that
maven-checkstyle-plugin 2.15 was built with JDK1.7 and is incompatible? I
wonder why we're only seeing this on the build server...


Maybe the build server is actually using JDK 1.6 to build.  I suspect most
developers are using JDK 1.7 or 1.8 to build.




 Original Message 
Subject: Accumulo-1.6 - Build # 931 - Still Failing
Date: Mon, 28 Dec 2015 20:11:54 + (UTC)
From: Apache Jenkins Server<jenk...@builds.apache.org>
Reply-To: j...@apache.org
To: notificati...@accumulo.apache.org

The Apache Jenkins build system has built Accumulo-1.6 (build #931)

Status: Still Failing

Check console output at https://builds.apache.org/job/Accumulo-1.6/931/
to view the results.





Fwd: Accumulo-1.6 - Build # 931 - Still Failing

2015-12-28 Thread Josh Elser

Was looking at this failure and saw:

[ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-checkstyle-plugin:2.15:check 
(check-style) on project accumulo-project: Execution check-style of goal 
org.apache.maven.plugins:maven-checkstyle-plugin:2.15:check failed: An 
API incompatibility was encountered while executing 
org.apache.maven.plugins:maven-checkstyle-plugin:2.15:check: 
java.lang.UnsupportedClassVersionError: 
com/puppycrawl/tools/checkstyle/api/CheckstyleException : Unsupported 
major.minor version 51.0


This is 1.6, so we're still using JDK1.6. So, that means that 
maven-checkstyle-plugin 2.15 was built with JDK1.7 and is incompatible? 
I wonder why we're only seeing this on the build server...


 Original Message 
Subject: Accumulo-1.6 - Build # 931 - Still Failing
Date: Mon, 28 Dec 2015 20:11:54 + (UTC)
From: Apache Jenkins Server 
Reply-To: j...@apache.org
To: notificati...@accumulo.apache.org

The Apache Jenkins build system has built Accumulo-1.6 (build #931)

Status: Still Failing

Check console output at https://builds.apache.org/job/Accumulo-1.6/931/ 
to view the results.


Re: State of our RPCs

2015-12-08 Thread Josh Elser

Well, this doesn't seem to have gone anywhere.

Oh well. For those who are still interested: 
https://issues.apache.org/jira/browse/THRIFT-3479


Josh Elser wrote:

(replying to myself since this was a common sentiment)

I want to be clear that I am most definitely _not_ advocating that we
rip out Thrift. I am still extremely far removed from that decision.

My question was more how can we make Thrift work better for us, than
look for a replacement.

Josh Elser wrote:

Hi --

My adventures in Thrift as a part of ACCUMULO-4065 are finally coming to
a close, it seems. The briefest summary I can give is that our hack to
work around an 0.9.0->0.9.1 compatibility issue ended up creating a bug
in a very obtuse case (when a server answering a oneway Thrift call
threw an RTE or an Error).

Given some other recent chatter in the project, I'm left wondering: what
next?

We've long considered Thrift to be a very useful tool, but extremely
scary to upgrade. I think this is just another sign of this. This leaves
me asking, how do we fix this?

Best as I understand it, Thrift is still a relatively active project (at
least their mailing list archives shows it). My impression is that the
Java library is much less-so. Most of our issues to me that they
ultimately stem from incompatibilities between libthrift versions and
uncaught performance regressions.

Assuming that to be true, do we need to make a coordinated effort to
improve the upstream libthrift code? Become a part of their community,
focusing on preventing these sorts of issues from ever filtering down to
us? Help them generate and follow compatibility guidelines?

I feel like our strategy over the past few years has been to "avert your
eyes" -- if we don't touch it, it'll hopefully be ok. Perhaps we need to
try something new. Thoughts?

- Josh


Fwd: Re: Trigger for Accumulo table

2015-12-08 Thread Josh Elser

(moving future discussion on listener hooks to dev@a.a.o)

We should take a look at HBase. They have an Observer API that runs 
server side which might serve as a good starting point. IIRC, it's 
designed for implementing this kind of functionality.


 Original Message 
Subject:Re: Trigger for Accumulo table
Date:   Tue, 8 Dec 2015 18:28:17 -0500
From:   Adam Fuchs <afu...@apache.org>
Reply-To:   u...@accumulo.apache.org
To: u...@accumulo.apache.org



I totally agree, Christopher. I have also run into a few situations
where it would have been nice to have something like a mutation listener
hook. Particularly in generating indexing and stats records.

Adam


On Tue, Dec 8, 2015 at 5:59 PM, Christopher <ctubb...@apache.org
<mailto:ctubb...@apache.org>> wrote:

In the future, it might be useful to provide a supported API hook
here. It certainly would've made implementing replication easier,
but could also be useful as a notification system.

On Tue, Dec 8, 2015 at 4:51 PM Keith Turner <ke...@deenlo.com
<mailto:ke...@deenlo.com>> wrote:

Constraints are checked before data is written.  In the case of
failures a constraint may see data thats never successfully 
written.


On Tue, Dec 8, 2015 at 4:18 PM, Christopher <ctubb...@apache.org
<mailto:ctubb...@apache.org>> wrote:

Look at org.apache.accumulo.core.constraints.Constraint for
a description and

org.apache.accumulo.core.constraints.DefaultKeySizeConstraint as
an example.

In short, Mutations which are live-ingested into a tablet
server are validated against constraints you specify on the
table. That means that all Mutations written to a table go
through this bit of user-provided code at least once. You
could use that fact to your advantage. However, this would
be highly experimental and might have some caveats to consider.

You can configure a constraint on a table with
connector.tableOperations().addConstraint(...)


On Sun, Dec 6, 2015 at 10:49 PM Thai Ngo
<baothai...@gmail.com <mailto:baothai...@gmail.com>> wrote:

Christopher,

This is interesting! Could you please give me more
details about this?

Thanks,
Thai

On Thu, Dec 3, 2015 at 12:17 PM, Christopher
<ctubb...@apache.org <mailto:ctubb...@apache.org>> wrote:

You could also implement a constraint to notify an
external system when a row is updated.


On Wed, Dec 2, 2015, 22:54 Josh Elser
<josh.el...@gmail.com <mailto:josh.el...@gmail.com>>
wrote:

oops :)

    [1] http://fluo.io/

Josh Elser wrote:
 > Hi Thai,
 >
 > There is no out-of-the-box feature provided
with Accumulo that does what
 > you're asking for. Accumulo doesn't provide
any functionality to push
 > notifications to other systems. You could
potentially maintain other
 > tables/columns in which you maintain the last
time a row was updated,
 > but the onus is on your "other services" to
read the table to find out
 > when a change occurred (which is probably not
scalable at "real time").
 >
 > There are other systems you could likely
leverage to solve this,
 > depending on the durability and scalability
that your application needs.
 >
 > For a system "close" to Accumulo, you could
take a look at Fluo [1]
 > which is an implementation of Google's
"Percolator" system. This is a
 > system based on throughput rather than
low-latency, so it may not be a
 > good fit for your needs. There are probably
other systems in the Apache
 > ecosystem (Kafka, Storm, Flink or Spark
Streaming maybe?) that are be
 > helpful to your problem. I'm not an expert on
these to recommend on (nor
  

Re: State of our RPCs

2015-12-01 Thread Josh Elser
Gotcha. Billie just noted in IRC that, when Accumulo has custom RPC code 
a long time ago, Keith's experiment showed Thrift 500x faster. That's 
probably how we got started.


Since then, I think it's just been a good solution that works well in 
the end (development work can be hard). The lack of an integrated 
serialization and transport system is what kept up with Thrift. I could 
only come up with gRPC that could be a true replacement, but I may be 
missing others.


dlmar...@comcast.net wrote:

I'm not suggesting that we replace Thrift (nor am I signing up to do it), just 
asking for the basis of the decision and if its time to revisit. I'm totally ok 
with a 'no' answer.

- Original Message -

From: "Josh Elser"<josh.el...@gmail.com>
To: dev@accumulo.apache.org
Sent: Tuesday, December 1, 2015 2:36:18 PM
Subject: Re: State of our RPCs

To play devil's advocate: I'm not sure if it's quite that simple. For
example, Avro has been around since 2009, but I don't think it'd be fair
to consider Avro circa 2009 to Avro circa 2015.

David Medinets wrote:

What new protocols have been introduced since the Thrift decisions? Can
someone provide pros and cons for that limited set of protocols?

On Tue, Dec 1, 2015 at 1:02 PM,<dlmar...@comcast.net>  wrote:


What was it about Thrift that drove us to use it? Was it the bindings for
multiple languages? Should this decision be revisited?

- Original Message -----

From: "Josh Elser"<josh.el...@gmail.com>
To: "dev"<dev@accumulo.apache.org>
Sent: Tuesday, December 1, 2015 12:49:26 PM
Subject: State of our RPCs

Hi --

My adventures in Thrift as a part of ACCUMULO-4065 are finally coming to
a close, it seems. The briefest summary I can give is that our hack to
work around an 0.9.0->0.9.1 compatibility issue ended up creating a bug
in a very obtuse case (when a server answering a oneway Thrift call
threw an RTE or an Error).

Given some other recent chatter in the project, I'm left wondering: what
next?

We've long considered Thrift to be a very useful tool, but extremely
scary to upgrade. I think this is just another sign of this. This leaves
me asking, how do we fix this?

Best as I understand it, Thrift is still a relatively active project (at
least their mailing list archives shows it). My impression is that the
Java library is much less-so. Most of our issues to me that they
ultimately stem from incompatibilities between libthrift versions and
uncaught performance regressions.

Assuming that to be true, do we need to make a coordinated effort to
improve the upstream libthrift code? Become a part of their community,
focusing on preventing these sorts of issues from ever filtering down to
us? Help them generate and follow compatibility guidelines?

I feel like our strategy over the past few years has been to "avert your
eyes" -- if we don't touch it, it'll hopefully be ok. Perhaps we need to
try something new. Thoughts?

- Josh







Re: State of our RPCs

2015-12-01 Thread Josh Elser

(replying to myself since this was a common sentiment)

I want to be clear that I am most definitely _not_ advocating that we 
rip out Thrift. I am still extremely far removed from that decision.


My question was more how can we make Thrift work better for us, than 
look for a replacement.


Josh Elser wrote:

Hi --

My adventures in Thrift as a part of ACCUMULO-4065 are finally coming to
a close, it seems. The briefest summary I can give is that our hack to
work around an 0.9.0->0.9.1 compatibility issue ended up creating a bug
in a very obtuse case (when a server answering a oneway Thrift call
threw an RTE or an Error).

Given some other recent chatter in the project, I'm left wondering: what
next?

We've long considered Thrift to be a very useful tool, but extremely
scary to upgrade. I think this is just another sign of this. This leaves
me asking, how do we fix this?

Best as I understand it, Thrift is still a relatively active project (at
least their mailing list archives shows it). My impression is that the
Java library is much less-so. Most of our issues to me that they
ultimately stem from incompatibilities between libthrift versions and
uncaught performance regressions.

Assuming that to be true, do we need to make a coordinated effort to
improve the upstream libthrift code? Become a part of their community,
focusing on preventing these sorts of issues from ever filtering down to
us? Help them generate and follow compatibility guidelines?

I feel like our strategy over the past few years has been to "avert your
eyes" -- if we don't touch it, it'll hopefully be ok. Perhaps we need to
try something new. Thoughts?

- Josh


Re: State of our RPCs

2015-12-01 Thread Josh Elser
IMO, Thrift provides a lot out of the box, and, all things considered, 
reduces our complexity greatly. For example:


* THsHaServer is great for our execution model
* We can inherit things like SSL and SASL (Kerberos)
* An _implemented_ RPC service (stuff like Hadoop RPC and Protobuf just 
provide the Service definition but not the actual transport).


I don't think the problems we have with Thrift are so great that we 
should throw it away.


dlmar...@comcast.net wrote:

What was it about Thrift that drove us to use it? Was it the bindings for 
multiple languages? Should this decision be revisited?

- Original Message -

From: "Josh Elser"<josh.el...@gmail.com>
To: "dev"<dev@accumulo.apache.org>
Sent: Tuesday, December 1, 2015 12:49:26 PM
Subject: State of our RPCs

Hi --

My adventures in Thrift as a part of ACCUMULO-4065 are finally coming to
a close, it seems. The briefest summary I can give is that our hack to
work around an 0.9.0->0.9.1 compatibility issue ended up creating a bug
in a very obtuse case (when a server answering a oneway Thrift call
threw an RTE or an Error).

Given some other recent chatter in the project, I'm left wondering: what
next?

We've long considered Thrift to be a very useful tool, but extremely
scary to upgrade. I think this is just another sign of this. This leaves
me asking, how do we fix this?

Best as I understand it, Thrift is still a relatively active project (at
least their mailing list archives shows it). My impression is that the
Java library is much less-so. Most of our issues to me that they
ultimately stem from incompatibilities between libthrift versions and
uncaught performance regressions.

Assuming that to be true, do we need to make a coordinated effort to
improve the upstream libthrift code? Become a part of their community,
focusing on preventing these sorts of issues from ever filtering down to
us? Help them generate and follow compatibility guidelines?

I feel like our strategy over the past few years has been to "avert your
eyes" -- if we don't touch it, it'll hopefully be ok. Perhaps we need to
try something new. Thoughts?

- Josh




Re: State of our RPCs

2015-12-01 Thread Josh Elser
To play devil's advocate: I'm not sure if it's quite that simple. For 
example, Avro has been around since 2009, but I don't think it'd be fair 
to consider Avro circa 2009 to Avro circa 2015.


David Medinets wrote:

What new protocols have been introduced since the Thrift decisions? Can
someone provide pros and cons for that limited set of protocols?

On Tue, Dec 1, 2015 at 1:02 PM,<dlmar...@comcast.net>  wrote:


What was it about Thrift that drove us to use it? Was it the bindings for
multiple languages? Should this decision be revisited?

- Original Message -

From: "Josh Elser"<josh.el...@gmail.com>
To: "dev"<dev@accumulo.apache.org>
Sent: Tuesday, December 1, 2015 12:49:26 PM
Subject: State of our RPCs

Hi --

My adventures in Thrift as a part of ACCUMULO-4065 are finally coming to
a close, it seems. The briefest summary I can give is that our hack to
work around an 0.9.0->0.9.1 compatibility issue ended up creating a bug
in a very obtuse case (when a server answering a oneway Thrift call
threw an RTE or an Error).

Given some other recent chatter in the project, I'm left wondering: what
next?

We've long considered Thrift to be a very useful tool, but extremely
scary to upgrade. I think this is just another sign of this. This leaves
me asking, how do we fix this?

Best as I understand it, Thrift is still a relatively active project (at
least their mailing list archives shows it). My impression is that the
Java library is much less-so. Most of our issues to me that they
ultimately stem from incompatibilities between libthrift versions and
uncaught performance regressions.

Assuming that to be true, do we need to make a coordinated effort to
improve the upstream libthrift code? Become a part of their community,
focusing on preventing these sorts of issues from ever filtering down to
us? Help them generate and follow compatibility guidelines?

I feel like our strategy over the past few years has been to "avert your
eyes" -- if we don't touch it, it'll hopefully be ok. Perhaps we need to
try something new. Thoughts?

- Josh






Re: State of our RPCs

2015-12-01 Thread Josh Elser
At a glance, it looks like this doesn't provide any transport, just the 
marshaling of data. I really don't want to own that logic (for example, 
I've seen the thousands of lines of code in HBase just for setting up 
their RPC server).


John R. Frank wrote:


It might be worth considering CBOR http://cbor.io/

jrf



Re: Typo in Accumulo v1.7 documentation

2015-11-30 Thread Josh Elser

Hi Kent,

Thanks for letting us know. You're more than welcome to create an issue 
on JIRA [1] for Accumulo for documentation.


As always, we're even happier to get a patch to fix it too :). The 
usermanual can be found in the docs/ directory [2]. We probably have the 
typo in all actively maintained Git branches, too (1.6, 1.7 and master).


[1] https://issues.apache.org/jira/secure/CreateIssue!default.jspa
[2] https://github.com/apache/accumulo/tree/1.7/docs/src/main/asciidoc

Kent McHenry wrote:

Hi,

I'm not sure how to report a typo in the documentation, but here goes:

On this page:
https://accumulo.apache.org/1.7/accumulo_user_manual.html#_batchscanner

In the following code sample "bscan" is created in the first line, but in
the ending for loop it is incorrectly referenced by "scan" (without the b):

ArrayList  ranges = new ArrayList(); // populate list of
ranges ... BatchScanner bscan = conn.createBatchScanner("table", auths, 10);
bscan.setRanges(ranges); bscan.fetchColumnFamily("attributes");
for(Entry
entry : scan) { System.out.println(entry.getValue()); }

Cheers,
Kent



Re: hasNext throws weird exception

2015-11-17 Thread Josh Elser

These things happen :)

z11373 wrote:

Found out the issue, my bad :-(



--
View this message in context: 
http://apache-accumulo.1065345.n5.nabble.com/hasNext-throws-weird-exception-tp15586p15587.html
Sent from the Developers mailing list archive at Nabble.com.


Re: total table rows

2015-11-12 Thread Josh Elser
Ick, that's kind of a pain. We should probably have some kind of utility 
to compute this for you.


You'd likely want to treat the row as a byte[] (instead of thinking in 
terms of characters) and decrement the last byte in the array. We have a 
static method on Range.followingPrefix(Text) which goes the opposite 
way. You could try to take that approach, while we (hopefully) consider 
adding a Range.previousRow(Text) or something.


z11373 wrote:

Thanks William! This is indeed what I was looking for.

Text startRow = new Text("k");
Text endRow = new Text("r");
ops.deleteRows("myTable", startRow, endRow);

 From Accumulo book, it said "When you specify start and end rows, the
deleteRows() method will remove rows that sort after but not including the
start row, and rows that sort before and including the end row."

Since it said not including the start row, what is the recommended way to
get the row before (note the row id can be number or letter)? I wonder why
it's doesn't include start row, otherwise my job would be easier.


Thanks,
Z




--
View this message in context: 
http://apache-accumulo.1065345.n5.nabble.com/total-table-rows-tp15484p15545.html
Sent from the Developers mailing list archive at Nabble.com.


Re: data comparison tool

2015-11-12 Thread Josh Elser
Yep, that's an easy way to check. It can just be slow depending on how 
much data you have.


I tried to write a slightly more parallel approach to verifying this 
based using a Merkle tree.


https://github.com/apache/accumulo/tree/master/test/system/merkle-replication

It's a little tricky as the boundaries of each leaf-node in the tree (a 
tablet) can affect the root value of the tree. In other words, if you 
don't have the same split points on both tables, the verification would 
fail.


z11373 wrote:

We currently write to tables in 2 places (this may change once we leverage
Accumulo 1.7 replication feature or another solution). I wonder if Accumulo
provides (or someone already wrote) the tool to compare data from both
tables (from 2 different Accumulo instances)?
Naïve solution I can think of is to iterate both tables (since they already
sorted by row ids) and perform something like 'merge' comparison, but it'd
definitely save my time if someone already wrote the implementation.

Thanks,
Z



--
View this message in context: 
http://apache-accumulo.1065345.n5.nabble.com/data-comparison-tool-tp15537.html
Sent from the Developers mailing list archive at Nabble.com.


Re: total table rows

2015-11-09 Thread Josh Elser
Note that CountingIterator is in the system iterator package 
(FirstEntryInRowIterator also isn't in the user package for iterators, 
so its stability is a little questionable too). I think David ran into 
this a long time ago as well.


Stable versions of both of these would be good, IMO. It isn't like Z is 
the first one to ask how to count the unique rows :)


William Slacum wrote:

Pranked... you can't use a CountingIterator, because it can't be init'd.
Can we get rid of that limitation?

On Mon, Nov 9, 2015 at 10:43 AM, William Slacum<wsla...@gmail.com>  wrote:


An interator stack of FirstEntryInRowIterator + CountingIterator will
return the count of rows in each tablet, which can then be combined on the
client side.

On Mon, Nov 9, 2015 at 10:25 AM, Josh Elser<josh.el...@gmail.com>  wrote:


Yeah, there's no explicit tracking of all rows in Accumulo, you're stuck
with enumerating them (or explicitly tracking them yourself at ingest time).

The easiest approach you can take is probably using the
FirstEntryInRowIterator and counting each row on the client-side.

You could do another summation in a second iterator but this is a little
tricky to get correct. I tried to touch on this a little in a blog post[1].
If this is a one-off question you want to answer, doing the summation on
the client side is likely not to take excessively longer than a server-side
summation.

[1]
https://blogs.apache.org/accumulo/entry/thinking_about_reads_over_accumulo


z11373 wrote:


I want to get total rows of a table (likely has more than 100M rows), I
think
to get that information, Accumulo would have to iterate all rows :-( This
may not be typical Accumulo scenario.

Is there a more efficient way to get total number of rows in a table?
When Accumulo iterating those items, does it mean it will pull the data
to
the client? If yes, is there a way to ask it to return just the number,
since that's the only data I care.

Thanks,
Z



--
View this message in context:
http://apache-accumulo.1065345.n5.nabble.com/total-table-rows-tp15484.html
Sent from the Developers mailing list archive at Nabble.com.





Re: total table rows

2015-11-09 Thread Josh Elser
Yeah, there's no explicit tracking of all rows in Accumulo, you're stuck 
with enumerating them (or explicitly tracking them yourself at ingest time).


The easiest approach you can take is probably using the 
FirstEntryInRowIterator and counting each row on the client-side.


You could do another summation in a second iterator but this is a little 
tricky to get correct. I tried to touch on this a little in a blog 
post[1]. If this is a one-off question you want to answer, doing the 
summation on the client side is likely not to take excessively longer 
than a server-side summation.


[1] 
https://blogs.apache.org/accumulo/entry/thinking_about_reads_over_accumulo


z11373 wrote:

I want to get total rows of a table (likely has more than 100M rows), I think
to get that information, Accumulo would have to iterate all rows :-( This
may not be typical Accumulo scenario.

Is there a more efficient way to get total number of rows in a table?
When Accumulo iterating those items, does it mean it will pull the data to
the client? If yes, is there a way to ask it to return just the number,
since that's the only data I care.

Thanks,
Z



--
View this message in context: 
http://apache-accumulo.1065345.n5.nabble.com/total-table-rows-tp15484.html
Sent from the Developers mailing list archive at Nabble.com.


Re: total table rows

2015-11-09 Thread Josh Elser

No worries, just getting everyone on the same page :)

David Medinets wrote:

Shutting up now. :)

On Mon, Nov 9, 2015 at 11:06 AM, Josh Elser<josh.el...@gmail.com>  wrote:


The question was to compute the number of rows, not the number of entries.
The metadata table does not track the number of rows.

David Medinets wrote:


It's not recommended to read the Metadata table? When I needed the 'real'
number, I ran a compaction. When I needed an estimate I just read the
table. I also upgraded our ingest process to track numbers as a second
phase to avoid the need for compaction to get 'real' numbers.

On Mon, Nov 9, 2015 at 10:52 AM, Josh Elser<josh.el...@gmail.com>   wrote:

Note that CountingIterator is in the system iterator package

(FirstEntryInRowIterator also isn't in the user package for iterators, so
its stability is a little questionable too). I think David ran into this
a
long time ago as well.

Stable versions of both of these would be good, IMO. It isn't like Z is
the first one to ask how to count the unique rows :)


William Slacum wrote:

Pranked... you can't use a CountingIterator, because it can't be init'd.

Can we get rid of that limitation?

On Mon, Nov 9, 2015 at 10:43 AM, William Slacum<wsla...@gmail.com>
wrote:

An interator stack of FirstEntryInRowIterator + CountingIterator will


return the count of rows in each tablet, which can then be combined on
the
client side.

On Mon, Nov 9, 2015 at 10:25 AM, Josh Elser<josh.el...@gmail.com>
wrote:

Yeah, there's no explicit tracking of all rows in Accumulo, you're
stuck


with enumerating them (or explicitly tracking them yourself at ingest
time).

The easiest approach you can take is probably using the
FirstEntryInRowIterator and counting each row on the client-side.

You could do another summation in a second iterator but this is a
little
tricky to get correct. I tried to touch on this a little in a blog
post[1].
If this is a one-off question you want to answer, doing the summation
on
the client side is likely not to take excessively longer than a
server-side
summation.

[1]


https://blogs.apache.org/accumulo/entry/thinking_about_reads_over_accumulo


z11373 wrote:

I want to get total rows of a table (likely has more than 100M rows),
I


think
to get that information, Accumulo would have to iterate all rows :-(
This
may not be typical Accumulo scenario.

Is there a more efficient way to get total number of rows in a table?
When Accumulo iterating those items, does it mean it will pull the
data
to
the client? If yes, is there a way to ask it to return just the
number,
since that's the only data I care.

Thanks,
Z



--
View this message in context:


http://apache-accumulo.1065345.n5.nabble.com/total-table-rows-tp15484.html
Sent from the Developers mailing list archive at Nabble.com.







Re: [DISCUSS] What to do about encryption at rest?

2015-11-05 Thread Josh Elser
+1 I think this is the right step. My hunch is that some of the common 
data access patterns that we have in Accumulo (over HBase) is that the 
per-colfam encryption isn't quick as common a design pattern as it is 
for HBase (please tell me I'm wrong if anyone disagrees -- this is 
mostly a gut reaction). I think our users would likely benefit more from 
a per-namespace/table encryption control like you suggest.


Implementing RFile encryption at HDFS level (e.g. tie a specific 
zone/key for a table) is probably straightforward. Changing the 
TServer's WAL use would likely be trickier to get right (a tserver would 
have multiple WALs, one for each unique zone/key from Tablet it happens 
to host). Maybe worrying about that is getting ahead of things -- just 
thought about it and figured I'd mention it :)


William Slacum wrote:

Yup, #2. I also don't know if it's worth the effort for that specific
feature. It might be easier to add something like per-namespace and/or
per-table encryption, then define common access patterns for applications
that want to use multiple keys for encryption.



On Wed, Nov 4, 2015 at 8:10 PM, Adam Fuchs  wrote:


Bill,

Do you envision one of the following as the driver behind finer-grained
encryption?:

1. We would only encrypt certain columns in order to get better
performance;

2. We would use different keys on different columns in order to revoke
access to a column via the key store;

3. We would only give a tablet server access to a subset of columns at any
given time in order to protect something, and figure out what to do for
compactions, etc.;

4. Something entirely different...

Seems like thing #2 might have merit, but I'm not sure it's worth the
effort.

Adam
On Nov 4, 2015 7:38 PM, "William Slacum"  wrote:


@Adam, column family level encryption can be useful for multi-tenant
environments, and I think it maps pretty well to the document
partitioning/sharding/wikisearch style tables. Things are trickier in
Accumulo than in HBase since there isn't a 1:1 mapping between column
families and files. The built in RFile encryption scheme seems better
suited to this.

@Christopher&  Keith, it's something we can evaluate. Is there a good

test

harness for just writing an RFile, opening a reader to it, and just

poking

around? I was looking at the constructors and they didn't seem
straightforward enough for me to comprehend them within a few seconds.



On Tue, Nov 3, 2015 at 9:56 PM, Keith Turner>  wrote:


On Mon, Nov 2, 2015 at 1:37 PM, Keith Turner>  wrote:



On Mon, Nov 2, 2015 at 12:27 PM, William Slacum>  wrote:

Is "the code being 'at rest'" you making a funny about active

development?

Making sure I haven't lost my ability to get jokes :)

I see two reasons why the code would be inactive: the feature is

good

enough as is or it's not interesting enough to attract attention.
Considering it's not public API, there are no discussions to bring

into

the
public API, and there's no effort to document how to use it, my

intuition

tells me that there isn't enough interest in it from a project
perspective.

 From a user perspective, I've been getting asked about it when I

work

with

Accumulo users. My recommendation, exclusively, is to use HDFS

encryption

because I can go to Hadoop's website and find documentation on it.

When

I

go to find documentation on Accumulo's offerings, any usability
information
comes from vendor SlideShares. Most mentions of the feature on

official

Apache Accumulo channels echo Christopher's sentiments on the

feature

being
experimental and not being officially recommended for use.

I wouldn't want to rip out the feature first and then figure things

out

later. Sean already alluded to it, but a roadmap should contain

something

(tool or documentation) to help users migrate if we go down that

route.

What I'm trying to figure out is, when the question of "How do I do
encryption at rest in Accumulo?" comes up, what is our community's

answer?

If we went down the route of using HDFS encryption zones, can we

offer

the

same features? At the very least, we'd be offering the same

database-level

Where does the decryption happen with DFS, is it in the DFS client?

If

so, using HDFS level encryption seems to offer the same

functionality???

Has anyone written a tool that takes an
Accumulo-encrypted-HDFS-unencrypted-RFile and rewrites it is as an
Accumulo-unencrypted-HDFS-encrypted-RFile?  Wondering if there are

any

unexpected gotchas w/ this.


I was discussing my questions w/ Christopher today and he mentioned an
experiment that I thought was interesting.   What is the random seek
performance of Accumulo-encrypted-HDFS-unencrypted-RFile vs
Accumulo-unencrypted-HDFS-encrypted-RFile?







Re: [DISCUSS] What to do about encryption at rest?

2015-11-05 Thread Josh Elser

SGTM

William Slacum wrote:

Just to moonwalk back a bit, I see a few things happening concurrently now.
First is trying to get a consensus on where we want to go with the
encryption at rest story in Accumulo.

I see us having established that what we have is scoped down to working for
WALs and RFiles, and if you happen to have written it, you are satisfied.
However, as a project, we haven't pulled it into the public API and haven't
provided documentation, so if you haven't written it, the process of
finding out how to configure and use the feature is indirect.

There is some consensus about moving to using HDFS encryption to achieve
the same features, but we want to test and see if the performance is
comparable between it and Accumulo's RFile encryption capability. There may
be caveats based on how you encrypt the data. We want to explore this
space. Mike would like a Jira ticket to outline this.

For adding features to Accumulo, we could potentially add encryption at the
column level. Questions about this involve the level of effort for
supporting this because, compared to other solutions, dynamic locality
groups make this a more difficult task when compared to products with a 1:1
mapping between locality groups and column families (as well as an extra
mapping to files).

Did I miss anything?

On Thu, Nov 5, 2015 at 1:27 PM, Adam Fuchs<afu...@apache.org>  wrote:


Camps two and three are the same camp, really. If we can identify a clear
roadmap (eventually via the right set of tickets), then it comes down to
whether people have energy and inclination to do the work. I don't think
the roadmap ends here.

Adam

On Thu, Nov 5, 2015 at 1:18 PM, Christopher<ctubb...@apache.org>  wrote:


Perhaps. I had interpreted some of Adam's comments ("The only thing that
doesn't get encrypted is a temporary WAL recovery file. That is a project
we should take on..."), as favoring improvements to the current state of
things. As that has also been the focus of previous conversations about

the

state of Accumulo's encryption-at-rest, I assumed that third camp also
existed. Perhaps I was wrong.

On Thu, Nov 5, 2015 at 1:11 PM Mike Drob<md...@apache.org>  wrote:


I think you have misidentified the two camps. There is a camp that

believes

we should phase out the code in favour of the HDFS encryption, and a

camp

that believes the code is sufficiently mature. I don't think there is a
group that is interested in improving the state of things.

On Thu, Nov 5, 2015 at 12:02 PM, Christopher<ctubb...@apache.org>

wrote:

JIRAs are fine, but I thought this thread was mostly addressing the

fact

that there doesn't seem to be a sustained interest in actually

working

on

any of the JIRAs addressing that area of code. Am I wrong? Is there
willingness from anybody to expend effort on this code? Even if not,

we

can

still make JIRAs, but they'll probably just be ignored. So, the

question

for me is: which JIRAs should we make? Are we going to pursue phasing

out

the code, or pursue improving it? Those are very different JIRA text.

On Thu, Nov 5, 2015 at 12:22 PM Mike Drob<md...@apache.org>  wrote:


Can we file some JIRAs to build out a suite to test this and run

the

necessary tests?

On Thu, Nov 5, 2015 at 11:17 AM, Christopher<ctubb...@apache.org>

wrote:

My main concern using HDFS encryption vs. built-in Accumulo

implementation

is possibly performance with respect to seeks. If we encrypt our

indexed

blocks independently (as we do now), I suspect our seeks would be

more

performant than relying on HDFS encryption, whose encrypted

blocks

may

not

fall on our index boundaries. If this is a small difference, it

might

still

be worth it for convenience and simpler maintenance, but I

suspect

the

difference will be somewhat substantial.

On Thu, Nov 5, 2015 at 12:11 PM Josh Elser<josh.el...@gmail.com

wrote:

+1 I think this is the right step. My hunch is that some of the

common

data access patterns that we have in Accumulo (over HBase) is

that

the

per-colfam encryption isn't quick as common a design pattern as

it

is

for HBase (please tell me I'm wrong if anyone disagrees -- this

is

mostly a gut reaction). I think our users would likely benefit

more

from

a per-namespace/table encryption control like you suggest.

Implementing RFile encryption at HDFS level (e.g. tie a

specific

zone/key for a table) is probably straightforward. Changing the
TServer's WAL use would likely be trickier to get right (a

tserver

would

have multiple WALs, one for each unique zone/key from Tablet it

happens

to host). Maybe worrying about that is getting ahead of things

--

just

thought about it and figured I'd mention it :)

William Slacum wrote:

Yup, #2. I also don't know if it's worth the effort for that

specific

feature. It might be easier to add something like

per-namespace

and/or

per-table encryption, then define common access patterns for

applicatio

Re: [DISCUSS] What to do about encryption at rest?

2015-11-03 Thread Josh Elser

Josef Roehrl - PHEMI wrote:

Thanks for exposing the issues on this.  I had equated 'stale' with
incomplete, but I was missing the point entirely.  In this case, 'stale'
equates to complete, working and stable (but not changing).


(pedantically) minus the intermediate-WAL recovery files not being encrypted


On Sat, Oct 31, 2015 at 4:22 PM, Josef Roehrl - PHEMI
wrote:


For this reason, we were just thinking of waiting for Encryption at Rest
with HDFS.  Presumably, Accumulo could optimize encryption if it
implemented encryption itself with a few trade-offs.

On Fri, Oct 30, 2015 at 10:22 PM, William Slacum
wrote:


So I've been looking into options for providing encryption at rest, and it
seems like what Accumulo has is abandonware from a project perspective.
There is no official documentation on how to perform encryption at rest,
and the best information from its status comes from year (or greater) old
ticket comments about how the feature is still experimental. Recently
there
was a talk that described using HDFS encryption zones as an alternative.

 From my perspective, this is what I see as the current situation:

1- Encryption at rest in Accumulo isn't actively being worked on
2- Encryption at rest in Accumulo isn't part of the public API or marketed
capabilities
3- Documentation for what does exist is scattered throughout Jira comments
or presentations
4- A viable alternative exists that appears to have feature parity in HDFS
encryption
5- HBase has finer grained encryption capabilities that extend beyond what
HDFS provides

Moving forward, what's the consensus for supporting this feature?
Personally, I see two options:

1- Start going down a path to bring the feature into the forefront and
start providing feature parity with HBase

or

2- Remove the feature and place emphasis on upstream encryption offerings

Any input is welcomed&  appreciated!




--


Josef Roehrl
Senior Software Developer
*PHEMI Systems*
180-887 Great Northern Way
Vancouver, BC V5T 4T5
604-336-1119
Website  Twitter
  Linkedin









Re: [DISCUSS] What to do about encryption at rest?

2015-10-30 Thread Josh Elser



William Slacum wrote:

So I've been looking into options for providing encryption at rest, and it
seems like what Accumulo has is abandonware from a project perspective.
There is no official documentation on how to perform encryption at rest,
and the best information from its status comes from year (or greater) old
ticket comments about how the feature is still experimental. Recently there
was a talk that described using HDFS encryption zones as an alternative.

 From my perspective, this is what I see as the current situation:

1- Encryption at rest in Accumulo isn't actively being worked on
2- Encryption at rest in Accumulo isn't part of the public API or marketed
capabilities
3- Documentation for what does exist is scattered throughout Jira comments
or presentations
4- A viable alternative exists that appears to have feature parity in HDFS
encryption
5- HBase has finer grained encryption capabilities that extend beyond what
HDFS provides

Moving forward, what's the consensus for supporting this feature?
Personally, I see two options:

1- Start going down a path to bring the feature into the forefront and
start providing feature parity with HBase

or

2- Remove the feature and place emphasis on upstream encryption offerings


+1

I'm only smart enough to know that I'm not smart enough to build a 
distributed database *and* encrypt it securely. I'd much prefer to defer 
to the people up the stack.


The one thing we'd miss out on is things like column-family-level 
encryption control (which I think HBase has), but I'd much rather have a 
complete encryption story before worrying about the fine-grained support.



Any input is welcomed&  appreciated!



Re: tablet split

2015-10-20 Thread Josh Elser
IIRC, either half of a split tablet will remain on the same node as the 
parent; however the next invocation of the configured balancer might 
move them per its policy.


z11373 wrote:

As my understanding, Accumulo will have data already sorted with row id, and
if the number of rows is growing, it will split the tablet at one point.
For example, let say I have following row ids:

1_abcxxx
1_abdxxx
1_abexxx
1_abfxxx
1_abgxxx
1_abhxxx
1_abixxx
...
1_zzzxxx
2_abcxxx
2_abdxxx
2_abexxx
2_abfxxx
2_abgxxx
2_abhxxx
...

Let say the data with row id starts with "1_" has a million of rows, and for
sake of example, let say the tablet size is 400K, so in this case the "1_"
data will be split into 3 tablets.

My question is will Accumulo distribute those 3 tablets into different
tablet server nodes? Or perhaps two or all of them will remain in that
original tablet server?


Thanks,
Z




--
View this message in context: 
http://apache-accumulo.1065345.n5.nabble.com/tablet-split-tp15399.html
Sent from the Developers mailing list archive at Nabble.com.


Problems with maven-java-formatter-plugin causing master builds to fail.

2015-10-18 Thread Josh Elser
It seems like the current version of maven-java-formatter-plugin that we 
depend on is adding in trailing whitespace to fate/**/ZooCache.java 
which is then causing the checkstyle verification to fail the build.


An example: 
https://secure.penguinsinabox.com/jenkins/job/Accumulo-Integration-Tests/359/console


The repository doesn't contain any trailing whitespace on L239, yet on 
my build machine:


diff --git 
a/fate/src/main/java/org/apache/accumulo/fate/zookeeper/ZooCache.java 
b/fate/src/main/java/org/apache/accumulo/fate/zookeeper/ZooCache.java

index 8c9f80d..33e5261 100644
--- a/fate/src/main/java/org/apache/accumulo/fate/zookeeper/ZooCache.java
+++ b/fate/src/main/java/org/apache/accumulo/fate/zookeeper/ZooCache.java
@@ -290,7 +290,7 @@ public class ZooCache {
 /*
  * The following call to exists() is important, since we are 
caching that a node does not exist. Once the node comes into existence, 
it will be added to
  * the cache. But this notification of a node coming into 
existence will only be given if exists() was previously called.

- *
+ *
  * If the call to exists() is bypassed and only getData() is 
called with a special case that looks for Code.NONODE in the 
KeeperException, then

  * non-existence can not be cached.
  */

@Christopher, is this a bug in your plugin? Maybe the underlying 
formatter itself?


Re: Problems with maven-java-formatter-plugin causing master builds to fail.

2015-10-18 Thread Josh Elser

Thanks. Fixed in https://issues.apache.org/jira/browse/ACCUMULO-4033

Christopher wrote:

Or make it a javadoc comment.

On Sun, Oct 18, 2015, 18:12 Christopher<ctubb...@apache.org>  wrote:


It's a bug in the formatter. The fix is to avoid newlines in multi-line
non-javadoc comments. You could switch to // comments or remove the blank
line.

On Sun, Oct 18, 2015, 18:06 Josh Elser<josh.el...@gmail.com>  wrote:


It seems like the current version of maven-java-formatter-plugin that we
depend on is adding in trailing whitespace to fate/**/ZooCache.java
which is then causing the checkstyle verification to fail the build.

An example:

https://secure.penguinsinabox.com/jenkins/job/Accumulo-Integration-Tests/359/console

The repository doesn't contain any trailing whitespace on L239, yet on
my build machine:

diff --git
a/fate/src/main/java/org/apache/accumulo/fate/zookeeper/ZooCache.java
b/fate/src/main/java/org/apache/accumulo/fate/zookeeper/ZooCache.java
index 8c9f80d..33e5261 100644
--- a/fate/src/main/java/org/apache/accumulo/fate/zookeeper/ZooCache.java
+++ b/fate/src/main/java/org/apache/accumulo/fate/zookeeper/ZooCache.java
@@ -290,7 +290,7 @@ public class ZooCache {
   /*
* The following call to exists() is important, since we are
caching that a node does not exist. Once the node comes into existence,
it will be added to
* the cache. But this notification of a node coming into
existence will only be given if exists() was previously called.
- *
+ *
* If the call to exists() is bypassed and only getData() is
called with a special case that looks for Code.NONODE in the
KeeperException, then
* non-existence can not be cached.
*/

@Christopher, is this a bug in your plugin? Maybe the underlying
formatter itself?





Re: accumulo framework for mesos

2015-10-09 Thread Josh Elser
Thanks for sharing, Jim. Think it's in a state to mention on 
http://accumulo.apache.org/projects.html?


If so, want to send a small blurb to include on the page?

Jim Klucar wrote:

Hey

accumulo-mesos is a framework for running Accumulo on top of a Mesos
cluster. (or having several Accumulo instances running on a cluster
together.)  It still needs work and a user interface, but I've deployed it
to a 15 node AWS instance and its seemed to work ok. Lots more coming if it
gets any traction.

https://github.com/aredee/accumulo-mesos

-Jim



Re: [ANNOUNCE] Apache Accumulo 1.6.4 released

2015-10-09 Thread Josh Elser
You have, and have always had, the ability to moderate your own mailing 
list subscriptions just as you signed yourself up to the list.


For example, to unsubscribe from u...@accumulo.apache.org, send an email 
to user-unsubscr...@accumulo.apache.org and follow the instructions you 
receive in reply.


Margo wrote:

Please unsubscribe me from this mailing list. My request is to all who
aer in cc or to.

I am unable to find the unsubscribe link in this email and I do not want
to mark it as spam since it is really not spam but not relevant to me as
of now. If I require, I want to join the mailing list in future.

Please try to understand and unsubscribe me.

On Thu, Oct 8, 2015 at 1:14 AM, Josh Elser <els...@apache.org
<mailto:els...@apache.org>> wrote:

The Apache Accumulo project is happy to announce its 1.6.4 release.

Version 1.6.4 is the most recent bug-fix release in its 1.6.x release
line. This version includes a fix for a data-loss bug over previous
versions in addition to other bug fixes. Existing users of the 1.6.x
release line are encouraged to upgrade immediately with confidence.

The Apache Accumulo sorted, distributed key/value store is a robust,
scalable, high performance data storage system that features cell-based
access control and customizable server-side processing.  It is based on
Google's BigTable design and is built on top of Apache Hadoop,
Apache Zookeeper, and Apache Thrift.

The release is available at http://accumulo.apache.org/downloads/ and
release notes at http://accumulo.apache.org/release_notes/1.6.4.html.

- The Apache Accumulo Team




Re: another question on summing combiner

2015-10-09 Thread Josh Elser
If you were doing a batch job to just recompute the stats, I'd probably 
make a new table and then rename it, replacing your old stats table. 
This can also be problematic in making sure clients that are still 
writing data will correctly write to the new table. Can you quiesce 
ingest temporarily?


In short, this is hard to do correctly (and there are edge cases that 
could potentially happen that make the table inaccurate at a very low 
probability). Have you considered just running the system for a while 
and seeing how skewed your stats are?


It kind of sounds like the easier problem to solve is whether or not 
some record exists in your system and then you can know definitively 
whether or not you need to even process that record again (much less 
update the stats table).


z11373 wrote:

Revisit this topic, if I go with option #2, i.e. having a batch job to fix
the stats table, now I am not really sure if it will work, since the stats
table already have summing combiner enabled, hence the batch job can't just
update the value since it'll be incorrect.
For example:

Current stats table contains:
foo | 2
bar | 3
test| 1

The batch job scan the main table, and going to update the stats table, let
say the actual stats is foo=1, bar=4, test=1, hence the final stats table
would become:
foo | 3
bar | 7
test| 2

It'd be correct if it removes the summing combiner from the table, but then
another process (not the batch job) may update particular key, overwriting
the correct value (updated from batch job). We can't tolerate the system is
offline, otherwise we can refresh the stats during that downtime. Any idea
on how to solve this problem?

Unfortunately there is an inherent problem with summing combiner, i.e. when
adding same key to main table, it'll behave just like 'update' when the same
key already exist, but my current logic will add|1 to the stats table,
so if we have many 'update', then some values in stats table will be far
off. Similar case for deleting, it will be no-op for main table if the key
doesn't exist, but the app logic will add|-1 to the stats table. This
is the reason why we're thinking to have a batch job to 'fix' the stats
table, but that also has its own problem :-(


Thanks,
Z






--
View this message in context: 
http://apache-accumulo.1065345.n5.nabble.com/another-question-on-summing-combiner-tp15238p15351.html
Sent from the Developers mailing list archive at Nabble.com.


Re: new committers!

2015-10-07 Thread Josh Elser

Billie Rinaldi wrote:

Join me in welcoming Dylan Hutchison and Russ Weeks as new Apache Accumulo
committers and PMC members!  Dylan and Russ, feel free to tell us about
yourselves and your development interests.

Billie



Yay! Congrats and welcome, Dylan and Russ!


[ANNOUNCE] Apache Accumulo 1.6.4 released

2015-10-07 Thread Josh Elser

The Apache Accumulo project is happy to announce its 1.6.4 release.

Version 1.6.4 is the most recent bug-fix release in its 1.6.x release
line. This version includes a fix for a data-loss bug over previous 
versions in addition to other bug fixes. Existing users of the 1.6.x 
release line are encouraged to upgrade immediately with confidence.


The Apache Accumulo sorted, distributed key/value store is a robust,
scalable, high performance data storage system that features cell-based
access control and customizable server-side processing.  It is based on
Google's BigTable design and is built on top of Apache Hadoop,
Apache Zookeeper, and Apache Thrift.

The release is available at http://accumulo.apache.org/downloads/ and
release notes at http://accumulo.apache.org/release_notes/1.6.4.html.

- The Apache Accumulo Team


Re: How to best measure how the lack of data-locality affects query performance

2015-10-07 Thread Josh Elser



Jeff Kubina wrote:

Per my thread "How does Accumulo process r-files for bulk ingesting?"
on the user@ list I would like to test/measure how a lack of
data-locality of bulk ingested files effects query performance. I seek
comments/suggestions on the outline of the design for the test:

Outline:
1. Create a table and pre-split it to have m tablets where m="total tservers".
2. Create 1 r-file containing m*n records that evenly distribute
across the m tablets.
3. Bulk ingest the r-file.
4. Query each of the split ranges in the table and log their times.
5. Compact the table and wait for the compaction to complete.
6. Query each of the split ranges in the table and log their times.
7. Compute the ratio of the median times from steps 4 and 6.

Questions:
1. Instead of compacting the table should I create a new table by
generating the m r-files whose ranges intersect only one of the
tablets and bulk ingest them?


If you can be tricky in your non-data-local case to evenly balance the 
data, you could just do one table import followed by a compaction and 
rerun on the same table.


You'd just want to make sure you have a decent distribution of the data 
across all servers in both the data-local and non-data-local cases



2. What is a good size for n, the number of records per tablet server?


I'm wondering if it depends on the type of workload that you're looking 
to run. Does it make a difference if you're just running randomized 
point queries? Or doing scan over the entire table?


Assuming you're just doing one tablet per server for your table (it's 
not apparent to me if there's a reason that would result in a lesser 
test), I'd guess a couple 100MB's worth of records per tablet would be 
good. Enough to get a few HDFS blocks per RFile, but not enough that 
Accumulo would automatically split it from underneath you. You could 
also try to increase the split threshold and put more data per file.


Re: scan command hung

2015-10-06 Thread Josh Elser



z11373 wrote:

Thanks Billie/Josh! That's indeed fixing the issue, the scan now returns
instantly!!

So when we scan the whole table and filtering by column family, Accumulo
still has to go through all rows (ordered by the key), and check if the
particular item has specific column family, and in my case since they are
intermingled, the data I am looking for could be somewhere in the middle or
in the end of the rfile, am I right?

I did another experiment, if I specify -b and -e, then it also returned
instantly (this before I moved them to different group and compact), which
does make sense, because Accumulo could narrow down to specific ranges, and
then filter them by column family.

I have another follow up question, does it mean I have to create new
locality group for each column family since I wouldn't know how big/small
the data belong to that cf in advance?

Btw, we shard the customers by putting their id as column family, so we'll
add new column family whenever new customer onboard. I think the case which
we have to scan the table with cf without specifying ranges may be rare (or
perhaps never, except if I run it from shell), but I am worried if this can
become perf bottleneck if I don't set them to separate locality group.


This strikes me as very odd. Sharding is the process of distribution 
some data set across multiple nodes. The only way this is done in 
Accumulo is by the row, not the column family. If you want fast, 
point-lookups by customer, you'd want this customer ID in the row. If 
that's a non-starter for some reason, this is a case where you'd want to 
implement a secondary index (usually as a separate table) that does have 
the customer ID in the row which then points to the row+colfam in your 
"data" table.


e.g. say your data is sharded/hashed/whatever by date.

20151006_1 cust_id_1:attr1 => value
20151006_1 cust_id_1:attr2 => value

You would make a second table which has something like

cust_id_1 : => 20151006_1

Where you have an empty colfam/colqual. There are ways you could also 
use these extra field to perform extra filtering.


Ultimately, locality groups are meant to have coarse grouping of "types 
of data" together rather than quick random access over an entire 
dataset. Does that make sense?



Another question, when running setgroups command, it looks like I have to
set for all of them, even I just added new cf. For example, let say I did:
setgroups mygroup=cf1,cf2 -t mytable
compact -t mytable -w

Then later I need to add cf3 to the same group, I have to do "setgroups
mygroup=cf1,cf2,c3 -t mytable", instead of just "setgroups mygroup=cf3 -t
mytable"

It'd be nice if I can do the latter :-) What happens with cf1 and cf2 if I
did the latter, does it mean they are coming back to default group again
after compaction?


Thanks,
Z




--
View this message in context: 
http://apache-accumulo.1065345.n5.nabble.com/scan-command-hung-tp15286p15324.html
Sent from the Developers mailing list archive at Nabble.com.


Re: scan command hung

2015-10-05 Thread Josh Elser
Yup that's exactly what my hunch was. You can try configuring a locality
group for your "slow" column families, compact the table and then rerun
your scans. They should be fast after you do this.
On Oct 5, 2015 11:25 AM, "z11373"  wrote:

> Hi Josh,
> I see there are 4 tablet files for that table, and all of them are in range
> from 730MB to 860MB in size.
> For those column families that have problem, they are in 2 of those 4
> tablets.
> They are only a few rows, but for those column families which have no
> problem, they have millions of rows.
> This makes me thinking if the slowness because it has to find those 'few'
> rows among those 'gigantic' rows in that physical tablet file?
>
> Thanks,
> Z
>
>
>
> --
> View this message in context:
> http://apache-accumulo.1065345.n5.nabble.com/scan-command-hung-tp15286p15320.html
> Sent from the Developers mailing list archive at Nabble.com.
>


No Javadocs in 1.6.4 bin tarball

2015-10-03 Thread Josh Elser
While updating the website, I noticed that the 1.6.4 bin tarball doesn't 
contain pre-built javadocs like 1.5.4 did.


My memory doesn't recall if this was an intentional change or something 
that just fell through the cracks.


Does anyone remember?


Re: No Javadocs in 1.6.4 bin tarball

2015-10-03 Thread Josh Elser

Okie doke. I will update the release guidelines to make sure this is explicit.

Thanks!

Christopher wrote:

This was an intentional change with 1.6.x

On Sat, Oct 3, 2015, 16:56 Josh Elser<josh.el...@gmail.com>  wrote:


While updating the website, I noticed that the 1.6.4 bin tarball doesn't
contain pre-built javadocs like 1.5.4 did.

My memory doesn't recall if this was an intentional change or something
that just fell through the cracks.

Does anyone remember?





[RESULT] [VOTE] Accumulo 1.6.4-rc1

2015-10-02 Thread Josh Elser

This vote passes with 4 +1s and nothing else.

Thanks to all who took the time to evaluate the RC.

- Josh

Josh Elser wrote:

Accumulo Developers,

Please consider the following candidate for Accumulo 1.6.4.

Git Commit:
edba4f4ca95d9e8ec0299a631234e5c9a319f9ec
Branch:
1.6.4-rc1

If this vote passes, a gpg-signed tag will be created using:
git tag -f -m 'Apache Accumulo 1.6.4' -s 1.6.4
edba4f4ca95d9e8ec0299a631234e5c9a319f9ec

Staging repo:
https://repository.apache.org/content/repositories/orgapacheaccumulo-1043
Source (official release artifact):
https://repository.apache.org/content/repositories/orgapacheaccumulo-1043/org/apache/accumulo/accumulo/1.6.4/accumulo-1.6.4-src.tar.gz

Binary:
https://repository.apache.org/content/repositories/orgapacheaccumulo-1043/org/apache/accumulo/accumulo/1.6.4/accumulo-1.6.4-bin.tar.gz

(Append ".sha1", ".md5", or ".asc" to download the signature/hash for a
given artifact.)

All artifacts were built and staged with:
mvn release:prepare && mvn release:perform

Signing keys are available at https://www.apache.org/dist/accumulo/KEYS
(Expected fingerprint: ABC8914C675FAD3FA74F39B2D146D62CAB471AE9)

Release notes have not yet been started, but the primary purpose for
this release is to provide a 1.6 release for the bulk-load dataloss bug.

Please vote one of:
[ ] +1 - I have verified and accept...
[ ] +0 - I have reservations, but not strong enough to vote against...
[ ] -1 - Because..., I do not accept...
... these artifacts as the 1.6.4 release of Apache Accumulo.

This vote will end on Fri Oct 2 04:30:00 UTC 2015
(Fri Oct 2 00:30:00 EDT 2015 / Thu Oct 1 21:30:00 PDT 2015)

Thanks!

P.S. Hint: download the whole staging repo with
wget -erobots=off -r -l inf -np -nH \

https://repository.apache.org/content/repositories/orgapacheaccumulo-orgapacheaccumulo-1043/

# note the trailing slash is needed


Re: scan command hung

2015-10-01 Thread Josh Elser
I'm wondering if the distribution of your few columns across the actual 
rfiles has an impact. I believe it could be that, even without column 
families, a subset of the rfiles could be precluded from even being 
opened (because we know your given column family doesn't exist in the file).


So, the one column family happens to be only in some files, where the 
other column family happens to be included in all the files (or at least 
some larger ones). Thus, in one case, Accumulo is just reading much less 
data.


You could try to use `accumulo rfile-info` on some of the RFiles in your 
table, looking for the column families in question.


z11373 wrote:

Hi Keith,
I left that scan command running, and it did return after a minute or so. I
think it's just slow somehow for those particular column families. I'll try
jstack-ing when I have chance later today.

Thanks,
Z



--
View this message in context: 
http://apache-accumulo.1065345.n5.nabble.com/scan-command-hung-tp15286p15302.html
Sent from the Developers mailing list archive at Nabble.com.


Re: scan command hung

2015-09-29 Thread Josh Elser
Ignoring the prerequisite: "are HDFS, ZooKeeper and Accumulo all running 
properly to the best of your knowledge"...


What does you table look like? How many rows? How many columns per row? 
Are the number of columns per row fixed or varied? Do you have any 
locality groups configured? Have you made any changes to the default 
configuration (e.g. files per tablet, disabled compactions, etc)?


If you open another shell while running a scan on this column family 
that "hangs", try running the `listscans` command. You should see the 
scan from your other shell as well as the server it's trying to 
communicate with. This should narrow down which server's logs to 
inspect. On that server:


* check the OS load (is it actually doing something)
* check the logs for any exceptions/errors
* check memory usage for the TabletServer (the logs also contain regular 
heap-size messages you can grep for).


If you're still stuck here, while you have this scan running, try 
`jstack`'ing that TabletServer a few times in succession and redirect it 
to a file. We should be able to piece together some sort of guess as to 
what the TabletServer is doing.


- Josh

z11373 wrote:

Hi,
I just experienced weird Accumulo issue I never seen before.

Running 'scan -t TABLE1 -c foo' from the shell, it just hung.
However, running scan for another column family (still on same table), i.e.
'scan -t TABLE1 -c bar' returns immediately. I looked at Accumulo logs, and
can't really figure out what's wrong. I tried with third column family, and
it also returns immediately.

It looks like the problem is only for specific column family, is that even
possible??
Any idea on how to troubleshoot this?


Thanks,
Z



--
View this message in context: 
http://apache-accumulo.1065345.n5.nabble.com/scan-command-hung-tp15286.html
Sent from the Developers mailing list archive at Nabble.com.


[VOTE] Accumulo 1.6.4-rc1

2015-09-28 Thread Josh Elser

Accumulo Developers,

Please consider the following candidate for Accumulo 1.6.4.

Git Commit:
edba4f4ca95d9e8ec0299a631234e5c9a319f9ec
Branch:
1.6.4-rc1

If this vote passes, a gpg-signed tag will be created using:
git tag -f -m 'Apache Accumulo 1.6.4' -s 1.6.4 
edba4f4ca95d9e8ec0299a631234e5c9a319f9ec


Staging repo: 
https://repository.apache.org/content/repositories/orgapacheaccumulo-1043
Source (official release artifact): 
https://repository.apache.org/content/repositories/orgapacheaccumulo-1043/org/apache/accumulo/accumulo/1.6.4/accumulo-1.6.4-src.tar.gz
Binary: 
https://repository.apache.org/content/repositories/orgapacheaccumulo-1043/org/apache/accumulo/accumulo/1.6.4/accumulo-1.6.4-bin.tar.gz
(Append ".sha1", ".md5", or ".asc" to download the signature/hash for a 
given artifact.)


All artifacts were built and staged with:
mvn release:prepare && mvn release:perform

Signing keys are available at https://www.apache.org/dist/accumulo/KEYS
(Expected fingerprint: ABC8914C675FAD3FA74F39B2D146D62CAB471AE9)

Release notes have not yet been started, but the primary purpose for 
this release is to provide a 1.6 release for the bulk-load dataloss bug.


Please vote one of:
[ ] +1 - I have verified and accept...
[ ] +0 - I have reservations, but not strong enough to vote against...
[ ] -1 - Because..., I do not accept...
... these artifacts as the 1.6.4 release of Apache Accumulo.

This vote will end on Fri Oct  2 04:30:00 UTC 2015
(Fri Oct  2 00:30:00 EDT 2015 / Thu Oct  1 21:30:00 PDT 2015)

Thanks!

P.S. Hint: download the whole staging repo with
wget -erobots=off -r -l inf -np -nH \

https://repository.apache.org/content/repositories/orgapacheaccumulo-orgapacheaccumulo-1043/
# note the trailing slash is needed


Re: [ANNOUNCE] Apache Gora 0.6.1 Release

2015-09-28 Thread Josh Elser
At least reading the interface's javadoc[1], seems like it would best 
benefit from a public-API tablet locator impl.


Let me stop sleeping, and I'll finally write one :)

[1] 
https://github.com/apache/gora/blob/master/gora-core/src/main/java/org/apache/gora/store/DataStore.java#L187


Eric Newton wrote:

I pulled down their git repo and upgraded it to 1.7.0.

There's code in there (getPartitions) that uses the tablet locator and
KeyExtent.

I wonder if there's a better way to do what it wants (with
AccumuloInputFormat?).

Keith, do you have any ideas on where this code originated?

It should be updated to use the ClientContext stuff.

-Eric

On Tue, Sep 15, 2015 at 11:12 AM, Josh Elser<els...@apache.org>  wrote:


Oy, we need to update their dependency...

Also, another project which would be good to add to Christopher's
rc-verify.

 Original Message 
Subject: [ANNOUNCE] Apache Gora 0.6.1 Release
Date: Mon, 14 Sep 2015 23:26:38 -0700
From: lewis john mcgibbney<lewi...@apache.org>
Reply-To: u...@hbase.apache.org
To: u...@hbase.apache.org<u...@hbase.apache.org>,
u...@accumulo.apache.org,  u...@cassandra.apache.org,
solr-u...@lucene.apache.org,  u...@giraph.apache.org, u...@avro.apache.org
<u...@avro.apache.org>,<u...@gora.apache.org>  <u...@gora.apache.org>,
u...@nutch.apache.org<u...@nutch.apache.org>, annou...@apache.org,
u...@spark.apache.org, u...@camel.apache.org

Hi All,

The Apache Gora team are pleased to announce the immediate availability of
Apache Gora 0.6.1.

What is Gora?
Gora is a framework which provides an in-memory data model and persistence
for big data. Gora supports persisting to column stores, key value stores,
document stores and RDBMSs, and analyzing the data with extensive Apache
Hadoop™<http://hadoop.apache.org>  MapReduce
<http://hadoop.apache.org/docs/stable/mapred_tutorial.html>  support. This
release also offers input and output formats for Apache Spark.

Whats in this release?

This release addresses a modest 21 issues<http://s.apache.org/l69>  with
many improvements and bug fixes for the gora-mongodb
<http://gora.apache.org/current/gora-mongodb.html>  module, resolution of a
major bug whilst flushing data to Apache Solr, a gora-gradle plugin
<https://issues.apache.org/jira/browse/GORA-330>  and our Gora Spark
backend
support<https://issues.apache.org/jira/browse/GORA-386>. Drop by our
mailing lists and ask questions for information on any of the above.

We provide Gora support for the following projects

- Apache Avro 1.7.6
- Apache Hadoop 1.2.1 and 2.5.2
- Apache HBase 0.98.8-hadoop2 (although also tested with 1.X)
- Apache Cassandra 2.0.2
- Apache Solr 4.10.3
- MongoDB 2.6.X
- Apache Accumlo 1.5.1
- Apache Spark 1.4.1

Gora is released as both source code, downloads for which can be found at
our downloads page<http://gora.apache.org/downloads.html>  as well as
Maven
artifacts which can be found on Maven central
<http://search.maven.org/#search%7Cga%7C1%7Cgora>.

Thank you
Lewis
(On behalf of the Apache Gora PMC)

http://people.apache.org/~lewismc || @hectorMcSpector ||
http://www.linkedin.com/in/lmcgibbney

   Apache Gora V.P || Apache Nutch PMC || Apache Any23 V.P ||
Apache OODT PMC
Apache Open Climate Workbench PMC || Apache Tika PMC || Apache
TAC
 Apache Usergrid || Apache HTrace (incubating) || Apache CommonsRDF
(incubating)






FYI: 1.5 branch deleted (again)

2015-09-26 Thread Josh Elser

Should be finally done with this branch.

Don't forget to run `git fetch --prune` and try not to re-push it, please :)

 Original Message 
Subject: Git Push Summary
Date: Sat, 26 Sep 2015 21:07:34 + (UTC)
From: els...@apache.org
Reply-To: dev@accumulo.apache.org
To: comm...@accumulo.apache.org

Repository: accumulo
Updated Branches:
  refs/heads/1.5 [deleted] 586c4ee19


Re: [ADVISORY] Possible data loss during HDFS decommissioning

2015-09-23 Thread Josh Elser
What kind of documentation can we put in the user manual about this? 
Recommend to only decom one rack at a time until we get the issue sorted 
out in Hadoop-land?


dlmar...@comcast.net wrote:

BLUF: There exists the possibility of data loss when performing DataNode 
decommissioning with Accumulo running. This note applies to installations of 
Accumulo 1.5.0+ and Hadoop 2.5.0+.

DETAILS: During DataNode decommissioning it is possible for the NameNode to 
report stale block locations (HDFS-8208). If Accumulo is running during this 
process then it is possible that files currently being written will not close 
properly. Accumulo is affected in two ways:

1. During compactions temporary rfiles are created, then closed, and renamed. 
If a failure happens during the close, the compaction will fail.
2. Write ahead log files are created, written to, and then closed. If a failure 
happens during the close, then the NameNode will have a walog file with no 
finalized blocks.

If either of these cases happen, decommissioning of the DataNode could hang 
(HDFS-3599, HDFS-5579) because the files are left in an open for write state. 
If Accumulo needs the write ahead log for recovery it will be unable to read 
the file and will not recover.

RECOMMENDATION: Assuming that the replication pipeline for the write ahead log 
is working properly, then you should not run into this issue if you only 
decommission one rack at a time.



Re: [ADVISORY] Possible data loss during HDFS decommissioning

2015-09-23 Thread Josh Elser

-cc user@ (figure I'm forking this into a more dev-focused question now)

True, we don't have procedures for retroactively changing docs. I guess 
JIRA essentially acts as this version-affected discovery mechanism for 
us. People generally seem to understand a search of JIRA to find known 
issues too.


My only worry of creating a page on the website is that it's yet-another 
place people have to search to get the details on some operational 
subject. We've been doing well (since 1.6) to capture details like this 
in the user manual, so I figured this would also make sense to mention 
there. Perhaps multiple places is reasonable too?


dlmar...@comcast.net wrote:

Known issue in the release notes on the web page? We would have to
update every version though. Seems like we need a known issues document
that lists issues in dependencies that transcend Accumulo versions.


*From: *"Josh Elser" <josh.el...@gmail.com>
*To: *dev@accumulo.apache.org
*Cc: *u...@accumulo.apache.org
*Sent: *Wednesday, September 23, 2015 10:26:50 AM
*Subject: *Re: [ADVISORY] Possible data loss during HDFS decommissioning

What kind of documentation can we put in the user manual about this?
Recommend to only decom one rack at a time until we get the issue sorted
out in Hadoop-land?

dlmar...@comcast.net wrote:
 > BLUF: There exists the possibility of data loss when performing
DataNode decommissioning with Accumulo running. This note applies to
installations of Accumulo 1.5.0+ and Hadoop 2.5.0+.
 >
 > DETAILS: During DataNode decommissioning it is possible for the
NameNode to report stale block locations (HDFS-8208). If Accumulo is
running during this process then it is possible that files currently
being written will not close properly. Accumulo is affected in two ways:
 >
 > 1. During compactions temporary rfiles are created, then closed, and
renamed. If a failure happens during the close, the compaction will fail.
 > 2. Write ahead log files are created, written to, and then closed. If
a failure happens during the close, then the NameNode will have a walog
file with no finalized blocks.
 >
 > If either of these cases happen, decommissioning of the DataNode
could hang (HDFS-3599, HDFS-5579) because the files are left in an open
for write state. If Accumulo needs the write ahead log for recovery it
will be unable to read the file and will not recover.
 >
 > RECOMMENDATION: Assuming that the replication pipeline for the write
ahead log is working properly, then you should not run into this issue
if you only decommission one rack at a time.
 >



[ANNOUNCE] Apache Accumulo 1.5.4 released

2015-09-21 Thread Josh Elser

The Apache Accumulo project is happy to announce its 1.5.4 release.

Version 1.5.4 is the most recent bug-fix release in its 1.5.x release
line. This version includes a fix for a data-loss bug over previous 
versions in addition to other minor bug fixes. Existing users of 1.5.x 
are encouraged to upgrade to this version immediately. New users are 
still encouraged to start with a 1.6 or 1.7 release.


The Apache Accumulo sorted, distributed key/value store is a robust,
scalable, high performance data storage system that features cell-based
access control and customizable server-side processing.  It is based on
Google's BigTable design and is built on top of Apache Hadoop,
Apache Zookeeper, and Apache Thrift.

The release is available at http://accumulo.apache.org/downloads/ and
release notes at http://accumulo.apache.org/release_notes/1.5.4.html.

- The Apache Accumulo Team


Time for 1.6.4

2015-09-21 Thread Josh Elser
(pushing on maintenance releases to counteract blowback from the 
bulk-load bug)


Anyone want to take this on now that 1.5.4 is out of the door?

It will require an audit of the 1.5.4 L changes that we made to ensure 
that nothing have changed and that the merges happened as expected, but 
I am happy to help with that if there is confusion.


- Josh


Re: [VOTE] Accumulo 1.5.4-rc2

2015-09-18 Thread Josh Elser

1.5.4-rc2 passes as Apache Accumulo 1.5.4 with five +1's and one -0.

I'll work on promoting the artifacts as time permits.

Josh Elser wrote:

Forgot to explicitly include my +1 (with ACCUMULO-4003 ack'ed)

Josh Elser wrote:

Accumulo Developers,

Please consider the following candidate for Accumulo 1.5.4.

Git Commit:
151db23e7d95cf77c08023ee18b7e524f78286fc
Branch:
1.5.4-rc2

If this vote passes, a gpg-signed tag will be created using:
git tag -f -m 'Apache Accumulo 1.5.4' -s 1.5.4
151db23e7d95cf77c08023ee18b7e524f78286fc

Staging repo:
https://repository.apache.org/content/repositories/orgapacheaccumulo-1041
Source (official release artifact):
https://repository.apache.org/content/repositories/orgapacheaccumulo-1041/org/apache/accumulo/accumulo/1.5.4/accumulo-1.5.4-src.tar.gz


Binary:
https://repository.apache.org/content/repositories/orgapacheaccumulo-1041/org/apache/accumulo/accumulo/1.5.4/accumulo-1.5.4-bin.tar.gz


(Append ".sha1", ".md5", or ".asc" to download the signature/hash for a
given artifact.)

All artifacts were built and staged with:
mvn release:prepare && mvn release:perform

Signing keys are available at https://www.apache.org/dist/accumulo/KEYS
(Expected fingerprint: ABC8914C675FAD3FA74F39B2D146D62CAB471AE9)

Release notes (in progress) can be found at
https://accumulo.apache.org/release_notes/1.5.4

Please vote one of:
[ ] +1 - I have verified and accept...
[ ] +0 - I have reservations, but not strong enough to vote against...
[ ] -1 - Because..., I do not accept...
... these artifacts as the 1.5.4 release of Apache Accumulo.

This vote will end on Fri Sep 18 22:00:00 UTC 2015
(Fri Sep 18 18:00:00 EDT 2015 / Fri Sep 18 15:00:00 PDT 2015)

Thanks!

P.S. Hint: download the whole staging repo with
wget -erobots=off -r -l inf -np -nH \

https://repository.apache.org/content/repositories/orgapacheaccumulo-1041/

# note the trailing slash is needed


Re: [VOTE] Accumulo 1.5.4-rc2

2015-09-18 Thread Josh Elser

Forgot to explicitly include my +1 (with ACCUMULO-4003 ack'ed)

Josh Elser wrote:

Accumulo Developers,

Please consider the following candidate for Accumulo 1.5.4.

Git Commit:
151db23e7d95cf77c08023ee18b7e524f78286fc
Branch:
1.5.4-rc2

If this vote passes, a gpg-signed tag will be created using:
git tag -f -m 'Apache Accumulo 1.5.4' -s 1.5.4
151db23e7d95cf77c08023ee18b7e524f78286fc

Staging repo:
https://repository.apache.org/content/repositories/orgapacheaccumulo-1041
Source (official release artifact):
https://repository.apache.org/content/repositories/orgapacheaccumulo-1041/org/apache/accumulo/accumulo/1.5.4/accumulo-1.5.4-src.tar.gz

Binary:
https://repository.apache.org/content/repositories/orgapacheaccumulo-1041/org/apache/accumulo/accumulo/1.5.4/accumulo-1.5.4-bin.tar.gz

(Append ".sha1", ".md5", or ".asc" to download the signature/hash for a
given artifact.)

All artifacts were built and staged with:
mvn release:prepare && mvn release:perform

Signing keys are available at https://www.apache.org/dist/accumulo/KEYS
(Expected fingerprint: ABC8914C675FAD3FA74F39B2D146D62CAB471AE9)

Release notes (in progress) can be found at
https://accumulo.apache.org/release_notes/1.5.4

Please vote one of:
[ ] +1 - I have verified and accept...
[ ] +0 - I have reservations, but not strong enough to vote against...
[ ] -1 - Because..., I do not accept...
... these artifacts as the 1.5.4 release of Apache Accumulo.

This vote will end on Fri Sep 18 22:00:00 UTC 2015
(Fri Sep 18 18:00:00 EDT 2015 / Fri Sep 18 15:00:00 PDT 2015)

Thanks!

P.S. Hint: download the whole staging repo with
wget -erobots=off -r -l inf -np -nH \

https://repository.apache.org/content/repositories/orgapacheaccumulo-1041/
# note the trailing slash is needed


Docs on verifying releases

2015-09-17 Thread Josh Elser
Given the recent spate of licensing blunders and a question by Ed 
Coleman, I made some time earlier this week to write out obligations for 
verifying releases on the website.


http://accumulo.staging.apache.org/verifying_releases.html

Reviews/feedback is welcome as always. Please feel free to 
modify/edit/correct it as you see fit. If I got something wrong, let me 
know and I'll fix it.


I wanted to get some eyes on it before I pushed it to prod.

Thanks all.

- Josh


Re: [VOTE] Accumulo 1.5.4-rc2

2015-09-17 Thread Josh Elser

Relatively quiet on this RC so far.

~1 day left on this one. Make sure to double check the licenses (that's 
all that's changed here over rc1) and cast your vote.


Thanks.

Josh Elser wrote:

Accumulo Developers,

Please consider the following candidate for Accumulo 1.5.4.

Git Commit:
151db23e7d95cf77c08023ee18b7e524f78286fc
Branch:
1.5.4-rc2

If this vote passes, a gpg-signed tag will be created using:
git tag -f -m 'Apache Accumulo 1.5.4' -s 1.5.4
151db23e7d95cf77c08023ee18b7e524f78286fc

Staging repo:
https://repository.apache.org/content/repositories/orgapacheaccumulo-1041
Source (official release artifact):
https://repository.apache.org/content/repositories/orgapacheaccumulo-1041/org/apache/accumulo/accumulo/1.5.4/accumulo-1.5.4-src.tar.gz

Binary:
https://repository.apache.org/content/repositories/orgapacheaccumulo-1041/org/apache/accumulo/accumulo/1.5.4/accumulo-1.5.4-bin.tar.gz

(Append ".sha1", ".md5", or ".asc" to download the signature/hash for a
given artifact.)

All artifacts were built and staged with:
mvn release:prepare && mvn release:perform

Signing keys are available at https://www.apache.org/dist/accumulo/KEYS
(Expected fingerprint: ABC8914C675FAD3FA74F39B2D146D62CAB471AE9)

Release notes (in progress) can be found at
https://accumulo.apache.org/release_notes/1.5.4

Please vote one of:
[ ] +1 - I have verified and accept...
[ ] +0 - I have reservations, but not strong enough to vote against...
[ ] -1 - Because..., I do not accept...
... these artifacts as the 1.5.4 release of Apache Accumulo.

This vote will end on Fri Sep 18 22:00:00 UTC 2015
(Fri Sep 18 18:00:00 EDT 2015 / Fri Sep 18 15:00:00 PDT 2015)

Thanks!

P.S. Hint: download the whole staging repo with
wget -erobots=off -r -l inf -np -nH \

https://repository.apache.org/content/repositories/orgapacheaccumulo-1041/
# note the trailing slash is needed


Re: Docs on verifying releases

2015-09-17 Thread Josh Elser
Ah yes, I did forget to aggregate the various pages and include them. I 
meant to do so.


Thanks for the feedback so far, all.

Sean Busbey wrote:

links to the foundation guidelines we're providing application guidance on
would also be a good addition, esp wrt LICENSE/NOTICE aggregation.

this is a really nice addition!

On Thu, Sep 17, 2015 at 8:59 PM, Sean Busbey<bus...@cloudera.com>  wrote:


it'd be nice if the "accumulo correctness" section linked to our existing
doc on that bit:
http://accumulo.staging.apache.org/governance/releasing.html

Or maybe they can be combined?

On Thu, Sep 17, 2015 at 12:20 PM, Mike Drob<md...@mdrob.com>  wrote:


Typo: "While a release of _and_ Apache project"

On Thu, Sep 17, 2015 at 12:13 PM, Josh Elser<josh.el...@gmail.com>
wrote:


Given the recent spate of licensing blunders and a question by Ed

Coleman,

I made some time earlier this week to write out obligations for

verifying

releases on the website.

http://accumulo.staging.apache.org/verifying_releases.html

Reviews/feedback is welcome as always. Please feel free to
modify/edit/correct it as you see fit. If I got something wrong, let me
know and I'll fix it.

I wanted to get some eyes on it before I pushed it to prod.

Thanks all.

- Josh




--
Sean







Re: Docs on verifying releases

2015-09-17 Thread Josh Elser
Just added the foundation docs that came up via my browser auto-complete 
and fixed the typo pointed out by Mike.


Feel free to dbl-check staging. I'll try to remember to promote this 
tmrw when I check up on the RC. If someone else is happy with it, feel 
free to click the button yourself (some sort of consensus is all I'm 
looking for, lazy or not).


Josh Elser wrote:

Ah yes, I did forget to aggregate the various pages and include them. I
meant to do so.

Thanks for the feedback so far, all.

Sean Busbey wrote:

links to the foundation guidelines we're providing application
guidance on
would also be a good addition, esp wrt LICENSE/NOTICE aggregation.

this is a really nice addition!

On Thu, Sep 17, 2015 at 8:59 PM, Sean Busbey<bus...@cloudera.com> wrote:


it'd be nice if the "accumulo correctness" section linked to our
existing
doc on that bit:
http://accumulo.staging.apache.org/governance/releasing.html

Or maybe they can be combined?

On Thu, Sep 17, 2015 at 12:20 PM, Mike Drob<md...@mdrob.com> wrote:


Typo: "While a release of _and_ Apache project"

On Thu, Sep 17, 2015 at 12:13 PM, Josh Elser<josh.el...@gmail.com>
wrote:


Given the recent spate of licensing blunders and a question by Ed

Coleman,

I made some time earlier this week to write out obligations for

verifying

releases on the website.

http://accumulo.staging.apache.org/verifying_releases.html

Reviews/feedback is welcome as always. Please feel free to
modify/edit/correct it as you see fit. If I got something wrong,
let me
know and I'll fix it.

I wanted to get some eyes on it before I pushed it to prod.

Thanks all.

- Josh




--
Sean







Request for help on Bigtop integration

2015-09-14 Thread Josh Elser
Sean Mackrory has staged some changes to Bigtop which add Accumulo 
support to the project.


Sadly, for his own reasons, he's not planning to finish this 
integration, which involves adding tests and keep a watchful eye over it 
as time passes.


https://issues.apache.org/jira/browse/BIGTOP-1175?focusedCommentId=14743740=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14743740

Before I go and add another responsibility on my plate, I figured I'd 
mention it here and see if anyone else is interested. We can coordinate 
efforts here, or just do it on that Bigtop JIRA issue.


- Josh


Re: sync datacenter

2015-09-10 Thread Josh Elser
I believe one lurking problem would be 
Scanners/BatchScanners/BatchWriters (and maybe other things reading and 
writing data) wouldn't notice the table name swap.


Accumulo presents the "human readable" name for users, but internally 
references things by "table id" (see `tables -l` in the shell). This 
table ID is immutable and uniquely assigned at table creation.


Clients that don't create new scanners/writers after step 3 will 
continue to read/write against the old table.


z11373 wrote:

Thanks Josh!
I see there is a 'renametable' command in Accumulo shell.
One possible option I can think of (all steps are done on target side):
1. Import the table to a temp table name
2. Rename original table to another temp name
3. Rename table from step #1 to correct table name

There is downtime incurred (step 2 and 3), but that window is very small.
This downtime is not what I am worried, but I am not sure if there are other
consequences of doing this operation, do you happen to know?


Thanks,
Z



--
View this message in context: 
http://apache-accumulo.1065345.n5.nabble.com/sync-datacenter-tp15087p15105.html
Sent from the Developers mailing list archive at Nabble.com.


Re: [VOTE] Accumulo 1.5.4-rc1

2015-09-10 Thread Josh Elser

Thanks again for taking the time to inspect things so thoroughly, Sean.

Others who have already voted, I'd ask for your opinion on whether we 
should sink this release (instead of me blindly going by majority rule).


Personally, I'm presently of the opinion that, given the severity of the 
bug(s) fixed in this release already, RC1 should pass. Considering that 
we've been making releases like this for quite some time w/o issue and 
1.5 is all but dead, let's push this release out, (again) table 1.5 and 
then make these improvements to 1.6 before we cut an RC there there when 
we have time to thoroughly vet the changes (instead of the 11th hour of 
a vote).


If there's a need for lengthy discussion, let's break this off the VOTE 
thread (I leave this message here for visibility).


- Josh

Sean Busbey wrote:

-1

* signatures check out
* checksums match
* licensing errors noted in ACCUMULO-3988

On Sat, Sep 5, 2015 at 4:27 PM, Josh Elser<els...@apache.org>  wrote:


Accumulo Developers,

Please consider the following candidate for Accumulo 1.5.4.

Git Commit:
 12a1041dcbb7f3b10543c305f27ece4b0d65ab9c
Branch:
 1.5.4-rc1

If this vote passes, a gpg-signed tag will be created using:
 git tag -f -m 'Apache Accumulo 1.5.4' -s 1.5.4
12a1041dcbb7f3b10543c305f27ece4b0d65ab9c

Staging repo:
https://repository.apache.org/content/repositories/orgapacheaccumulo-1039
Source (official release artifact):
https://repository.apache.org/content/repositories/orgapacheaccumulo-1039/org/apache/accumulo/accumulo/1.5.4/accumulo-1.5.4-src.tar.gz
Binary:
https://repository.apache.org/content/repositories/orgapacheaccumulo-1039/org/apache/accumulo/accumulo/1.5.4/accumulo-1.5.4-bin.tar.gz
(Append ".sha1", ".md5", or ".asc" to download the signature/hash for a
given artifact.)

All artifacts were built and staged with:
 mvn release:prepare&&  mvn release:perform

Signing keys are available at https://www.apache.org/dist/accumulo/KEYS
(Expected fingerprint: ABC8914C675FAD3FA74F39B2D146D62CAB471AE9)

Release notes (in progress) can be found at
https://accumulo.apache.org/release_notes/1.5.4

Please vote one of:
[ ] +1 - I have verified and accept...
[ ] +0 - I have reservations, but not strong enough to vote against...
[ ] -1 - Because..., I do not accept...
... these artifacts as the 1.5.4 release of Apache Accumulo.

This vote will end on Thurs Sep  10 23:00:00 UTC 2015
(Thurs Sep  10 20:00:00 EDT 2015 / Thurs Sep  10 17:00:00 PDT 2015)

Thanks!

P.S. Hint: download the whole staging repo with
 wget -erobots=off -r -l inf -np -nH \

https://repository.apache.org/content/repositories/orgapacheaccumulo-1039/
 # note the trailing slash is needed







Re: [VOTE] Accumulo 1.5.4-rc1

2015-09-10 Thread Josh Elser
Uh, my understanding is that a binary jar by definition is not a 
foundation sponsored release (it's binary). Where's the docs/history on 
declaring a binary jar as an official release?


The omission of sizzle.js's in LICENSE and copying Thrift's NOTICE into 
our NOTICE for the _official source_ release is still a problem.


Sean Busbey wrote:

As members of the PMC, we're required to verify all releases we approve of
meet ASF licensing policy[1], so I don't consider the issues "minor".

Mistakenly violating policy in the past is a different kind of problem than
moving forward to knowingly violate it.

In particular, not all of the bundled works have copyrights that are
covered under a donation to the Foundation. If we distribute e.g. the
accumulo-core binary jar in its current state the foundation will be
committing willful copyright infringement. The binary tarball (and I'd
imagine the rpm/deb files) have similar problems because we'd be violating
the terms of the included works' respective licenses.


[1]: http://www.apache.org/dev/release.html#what-must-every-release-contain

On Thu, Sep 10, 2015 at 11:51 AM, Billie Rinaldi<billie.rina...@gmail.com>
wrote:


Agreed.

On Thu, Sep 10, 2015 at 9:47 AM, Christopher<ctubb...@apache.org>  wrote:


I think the license issues are relatively small compared to the bugfixes,
especially since we're really trying to close out 1.5.x development. So,
given the options, I'd prefer to pass RC1, and make the license fixes in
1.6.x and later, as applicable.

On Thu, Sep 10, 2015 at 12:28 PM Josh Elser<els...@apache.org>  wrote:


Thanks again for taking the time to inspect things so thoroughly, Sean.

Others who have already voted, I'd ask for your opinion on whether we
should sink this release (instead of me blindly going by majority

rule).

Personally, I'm presently of the opinion that, given the severity of

the

bug(s) fixed in this release already, RC1 should pass. Considering that
we've been making releases like this for quite some time w/o issue and
1.5 is all but dead, let's push this release out, (again) table 1.5 and
then make these improvements to 1.6 before we cut an RC there there

when

we have time to thoroughly vet the changes (instead of the 11th hour of
a vote).

If there's a need for lengthy discussion, let's break this off the VOTE
thread (I leave this message here for visibility).

- Josh

Sean Busbey wrote:

-1

* signatures check out
* checksums match
* licensing errors noted in ACCUMULO-3988

On Sat, Sep 5, 2015 at 4:27 PM, Josh Elser<els...@apache.org>

wrote:

Accumulo Developers,

Please consider the following candidate for Accumulo 1.5.4.

Git Commit:
  12a1041dcbb7f3b10543c305f27ece4b0d65ab9c
Branch:
  1.5.4-rc1

If this vote passes, a gpg-signed tag will be created using:
  git tag -f -m 'Apache Accumulo 1.5.4' -s 1.5.4
12a1041dcbb7f3b10543c305f27ece4b0d65ab9c

Staging repo:


https://repository.apache.org/content/repositories/orgapacheaccumulo-1039

Source (official release artifact):


https://repository.apache.org/content/repositories/orgapacheaccumulo-1039/org/apache/accumulo/accumulo/1.5.4/accumulo-1.5.4-src.tar.gz

Binary:


https://repository.apache.org/content/repositories/orgapacheaccumulo-1039/org/apache/accumulo/accumulo/1.5.4/accumulo-1.5.4-bin.tar.gz

(Append ".sha1", ".md5", or ".asc" to download the signature/hash

for

a

given artifact.)

All artifacts were built and staged with:
  mvn release:prepare&&   mvn release:perform

Signing keys are available at

https://www.apache.org/dist/accumulo/KEYS

(Expected fingerprint: ABC8914C675FAD3FA74F39B2D146D62CAB471AE9)

Release notes (in progress) can be found at
https://accumulo.apache.org/release_notes/1.5.4

Please vote one of:
[ ] +1 - I have verified and accept...
[ ] +0 - I have reservations, but not strong enough to vote

against...

[ ] -1 - Because..., I do not accept...
... these artifacts as the 1.5.4 release of Apache Accumulo.

This vote will end on Thurs Sep  10 23:00:00 UTC 2015
(Thurs Sep  10 20:00:00 EDT 2015 / Thurs Sep  10 17:00:00 PDT 2015)

Thanks!

P.S. Hint: download the whole staging repo with
  wget -erobots=off -r -l inf -np -nH \



https://repository.apache.org/content/repositories/orgapacheaccumulo-1039/

  # note the trailing slash is needed










Re: [VOTE] Accumulo 1.5.4-rc1

2015-09-10 Thread Josh Elser

Michael,

I'd take some time to catch up on some mailing lists like 
general@incubator before passing judgement like that. It's rather hasty.


Michael Ridley wrote:

-1 (nonbinding)

I agree, license issues are important and it sounds like the ASF policy
doesn't leave a lot of room for interpretation

On Thu, Sep 10, 2015 at 2:49 PM, John Vines<vi...@apache.org>  wrote:


-1

I'm with Sean on this one. Ignoring now known licensing issues because we
hadn't handled them in the past is not a valid excuse.

On Thu, Sep 10, 2015 at 2:27 PM Sean Busbey<bus...@cloudera.com>  wrote:


As members of the PMC, we're required to verify all releases we approve

of

meet ASF licensing policy[1], so I don't consider the issues "minor".

Mistakenly violating policy in the past is a different kind of problem

than

moving forward to knowingly violate it.

In particular, not all of the bundled works have copyrights that are
covered under a donation to the Foundation. If we distribute e.g. the
accumulo-core binary jar in its current state the foundation will be
committing willful copyright infringement. The binary tarball (and I'd
imagine the rpm/deb files) have similar problems because we'd be

violating

the terms of the included works' respective licenses.


[1]:
http://www.apache.org/dev/release.html#what-must-every-release-contain

On Thu, Sep 10, 2015 at 11:51 AM, Billie Rinaldi<

billie.rina...@gmail.com

wrote:


Agreed.

On Thu, Sep 10, 2015 at 9:47 AM, Christopher<ctubb...@apache.org>

wrote:

I think the license issues are relatively small compared to the

bugfixes,

especially since we're really trying to close out 1.5.x development.

So,

given the options, I'd prefer to pass RC1, and make the license fixes

in

1.6.x and later, as applicable.

On Thu, Sep 10, 2015 at 12:28 PM Josh Elser<els...@apache.org>

wrote:

Thanks again for taking the time to inspect things so thoroughly,

Sean.

Others who have already voted, I'd ask for your opinion on whether

we

should sink this release (instead of me blindly going by majority

rule).

Personally, I'm presently of the opinion that, given the severity

of

the

bug(s) fixed in this release already, RC1 should pass. Considering

that

we've been making releases like this for quite some time w/o issue

and

1.5 is all but dead, let's push this release out, (again) table 1.5

and

then make these improvements to 1.6 before we cut an RC there there

when

we have time to thoroughly vet the changes (instead of the 11th

hour

of

a vote).

If there's a need for lengthy discussion, let's break this off the

VOTE

thread (I leave this message here for visibility).

- Josh

Sean Busbey wrote:

-1

* signatures check out
* checksums match
* licensing errors noted in ACCUMULO-3988

On Sat, Sep 5, 2015 at 4:27 PM, Josh Elser<els...@apache.org>

wrote:

Accumulo Developers,

Please consider the following candidate for Accumulo 1.5.4.

Git Commit:
  12a1041dcbb7f3b10543c305f27ece4b0d65ab9c
Branch:
  1.5.4-rc1

If this vote passes, a gpg-signed tag will be created using:
  git tag -f -m 'Apache Accumulo 1.5.4' -s 1.5.4
12a1041dcbb7f3b10543c305f27ece4b0d65ab9c

Staging repo:


https://repository.apache.org/content/repositories/orgapacheaccumulo-1039

Source (official release artifact):


https://repository.apache.org/content/repositories/orgapacheaccumulo-1039/org/apache/accumulo/accumulo/1.5.4/accumulo-1.5.4-src.tar.gz

Binary:


https://repository.apache.org/content/repositories/orgapacheaccumulo-1039/org/apache/accumulo/accumulo/1.5.4/accumulo-1.5.4-bin.tar.gz

(Append ".sha1", ".md5", or ".asc" to download the

signature/hash

for

a

given artifact.)

All artifacts were built and staged with:
  mvn release:prepare&&   mvn release:perform

Signing keys are available at

https://www.apache.org/dist/accumulo/KEYS

(Expected fingerprint: ABC8914C675FAD3FA74F39B2D146D62CAB471AE9)

Release notes (in progress) can be found at
https://accumulo.apache.org/release_notes/1.5.4

Please vote one of:
[ ] +1 - I have verified and accept...
[ ] +0 - I have reservations, but not strong enough to vote

against...

[ ] -1 - Because..., I do not accept...
... these artifacts as the 1.5.4 release of Apache Accumulo.

This vote will end on Thurs Sep  10 23:00:00 UTC 2015
(Thurs Sep  10 20:00:00 EDT 2015 / Thurs Sep  10 17:00:00 PDT

2015)

Thanks!

P.S. Hint: download the whole staging repo with
  wget -erobots=off -r -l inf -np -nH \



https://repository.apache.org/content/repositories/orgapacheaccumulo-1039/

  # note the trailing slash is needed







--
Sean







Re: [VOTE] Accumulo 1.5.4-rc1

2015-09-10 Thread Josh Elser



Sean Busbey wrote:

We can't tie the ability to vote -1 on a release to volunteering to fix the
issue that causes a -1. Presuming a release is valued by the community, the
work will get done.

At the same time, it is crappy for Josh to be expected to fix everything,
especially if he doesn't want to fill the role of janitor. We all have
plenty of other things to do, but so does he.


Thanks.


So let's deal with the matter of the vote at hand first. After that we can
deal with fixing things, hopefully with Josh abstaining. (Josh I'd
recommend un-assigning yourself from the issue if you'd prefer someone else
take it up.)



I'll likely make time tonight to bash through as many of the issues as I 
can tonight. The assumption was that if 1.5.4 was going to happen "now", 
I would be the one who had to do it, hence my quick self-assign. 
Hopefully that was not interpreted as I'm doing it all (I think not 
since Christopher had asked me how he could help too).


I'm not sure why I have to abstain... but I think I've already said my 
peace on the matter. Eric has told me in confidence (that I'm apparently 
going to violate) that he has drafted a number of responses already, but 
hasn't sent them due to being frustrated.


At risk of suggesting that we vote on a vote.. what do you think needs 
to happen now?


[RESULT] [VOTE] Accumulo 1.5.4-rc1

2015-09-10 Thread Josh Elser

RC1 has failed due to licensing concerns.

Josh Elser wrote:

Also, a heads-up since I had one question about this already: you
(hopefully) will notice that this was signed using a different key than
previously for me. This is expected.

I built this release on a virtual server (under my virtual but not
physical control). As such, I did not feel comfortable placing my
existing private key on this server and created a new one instead.

You'll want to recv-keys on my new key when verifying checksums: `gpg
--keyserver pgp.mit.edu --recv-keys AB471AE9`

Josh Elser wrote:

Accumulo Developers,

Please consider the following candidate for Accumulo 1.5.4.

Git Commit:
12a1041dcbb7f3b10543c305f27ece4b0d65ab9c
Branch:
1.5.4-rc1

If this vote passes, a gpg-signed tag will be created using:
git tag -f -m 'Apache Accumulo 1.5.4' -s 1.5.4
12a1041dcbb7f3b10543c305f27ece4b0d65ab9c

Staging repo:
https://repository.apache.org/content/repositories/orgapacheaccumulo-1039
Source (official release artifact):
https://repository.apache.org/content/repositories/orgapacheaccumulo-1039/org/apache/accumulo/accumulo/1.5.4/accumulo-1.5.4-src.tar.gz


Binary:
https://repository.apache.org/content/repositories/orgapacheaccumulo-1039/org/apache/accumulo/accumulo/1.5.4/accumulo-1.5.4-bin.tar.gz


(Append ".sha1", ".md5", or ".asc" to download the signature/hash for a
given artifact.)

All artifacts were built and staged with:
mvn release:prepare && mvn release:perform

Signing keys are available at https://www.apache.org/dist/accumulo/KEYS
(Expected fingerprint: ABC8914C675FAD3FA74F39B2D146D62CAB471AE9)

Release notes (in progress) can be found at
https://accumulo.apache.org/release_notes/1.5.4

Please vote one of:
[ ] +1 - I have verified and accept...
[ ] +0 - I have reservations, but not strong enough to vote against...
[ ] -1 - Because..., I do not accept...
... these artifacts as the 1.5.4 release of Apache Accumulo.

This vote will end on Thurs Sep 10 23:00:00 UTC 2015
(Thurs Sep 10 20:00:00 EDT 2015 / Thurs Sep 10 17:00:00 PDT 2015)

Thanks!

P.S. Hint: download the whole staging repo with
wget -erobots=off -r -l inf -np -nH \

https://repository.apache.org/content/repositories/orgapacheaccumulo-1039/

# note the trailing slash is needed


Re: [VOTE] Accumulo 1.5.4-rc1

2015-09-10 Thread Josh Elser

Sean Busbey wrote:

>  So let's deal with the matter of the vote at hand first. After that we can

>>  deal with fixing things, hopefully with Josh abstaining. (Josh I'd
>>  recommend un-assigning yourself from the issue if you'd prefer someone
>>  else
>>  take it up.)
>>
>>

>  I'll likely make time tonight to bash through as many of the issues as I
>  can tonight. The assumption was that if 1.5.4 was going to happen "now", I
>  would be the one who had to do it, hence my quick self-assign. Hopefully
>  that was not interpreted as I'm doing it all (I think not since Christopher
>  had asked me how he could help too).
>
>  I'm not sure why I have to abstain... but I think I've already said my
>  peace on the matter. Eric has told me in confidence (that I'm apparently
>  going to violate) that he has drafted a number of responses already, but
>  hasn't sent them due to being frustrated.
>
>

I still feel weird taking on work that someone else is actively working on.
But making a subtask as Christopher has done works well for me.


I see you've already moved forward. You have my blessing (again). I will 
catch up tonight with progress as defined.


Re: [VOTE] Accumulo 1.5.4-rc1

2015-09-10 Thread Josh Elser
I _think_ we put ourselves in hot-water too if that becomes a norm, but 
I understand the point. I just saw that as a assumption a user could do. 
Specifically, the user who asked for this release, James, had asked for 
a release which is why I was confused at your comment.


dlmar...@comcast.net wrote:

  Not suggesting encouragement to use the release. But if someone is stuck on 
1.5, and needs the fix, there is an alternative. I speak from experience on 
this issue w/r/t a different Apache project.


-Original Message-
From: Christopher [mailto:ctubb...@apache.org]
Sent: Thursday, September 10, 2015 5:30 PM
To: dev@accumulo.apache.org
Subject: Re: [VOTE] Accumulo 1.5.4-rc1

Not releasing is a viable option, but we can't encourage users to use an
unreleased version of our code. That's not an appropriate substitute for
releasing.

On Thu, Sep 10, 2015 at 4:19 PM<dlmar...@comcast.net>  wrote:


If the critical issues are fixed in 1.5.x, and someone needs them,
can't they check out the source and build it themselves? Is that a viable

option?

- Original Message -

From: "Christopher"<ctubb...@apache.org>
To: dev@accumulo.apache.org
Sent: Thursday, September 10, 2015 3:44:20 PM
Subject: Re: [VOTE] Accumulo 1.5.4-rc1

The larger concern I have is that expecting it to be fixed prior to
1.5.4 might mean loss of willingness to create an RC2 for 1.5.4 and
release it at all. Recall, the 1.5 branch was only revived at all to
fix some critical issues and move on. It's still a viable alternative
to abandon 1.5.x and focus on fixing these issues in 1.6, and later,
where we've made dramatic improvements to the build. Fixing these
newly identified issues is going to take some time and effort. It's
not that it can't be done for 1.5.4... but the question is... who is going to do

it? Is anybody who issued a "-1"

willing to step up and resolve these issues? Or will it rely on Josh?
I'm currently looking at some of the issues to try to help out if I
can, but I also have other obligations.

On Thu, Sep 10, 2015 at 3:33 PM Alex Moundalexis<al...@cloudera.com>
wrote:


-1 (non-binding)

Fix now and it'll be fixed here and in 1.6.x.

On Thu, Sep 10, 2015 at 2:46 AM, Sean Busbey<bus...@cloudera.com>

wrote:

-1

* signatures check out
* checksums match
* licensing errors noted in ACCUMULO-3988

On Sat, Sep 5, 2015 at 4:27 PM, Josh Elser<els...@apache.org>  wrote:


Accumulo Developers,

Please consider the following candidate for Accumulo 1.5.4.

Git Commit:
12a1041dcbb7f3b10543c305f27ece4b0d65ab9c
Branch:
1.5.4-rc1

If this vote passes, a gpg-signed tag will be created using:
git tag -f -m 'Apache Accumulo 1.5.4' -s 1.5.4
12a1041dcbb7f3b10543c305f27ece4b0d65ab9c

Staging repo:


https://repository.apache.org/content/repositories/orgapacheaccumulo-1
039

Source (official release artifact):


https://repository.apache.org/content/repositories/orgapacheaccumulo-1
039/org/apache/accumulo/accumulo/1.5.4/accumulo-1.5.4-src.tar.gz

Binary:


https://repository.apache.org/content/repositories/orgapacheaccumulo-1
039/org/apache/accumulo/accumulo/1.5.4/accumulo-1.5.4-bin.tar.gz

(Append ".sha1", ".md5", or ".asc" to download the
signature/hash

for a

given artifact.)

All artifacts were built and staged with:
mvn release:prepare&&  mvn release:perform

Signing keys are available at

https://www.apache.org/dist/accumulo/KEYS

(Expected fingerprint:

ABC8914C675FAD3FA74F39B2D146D62CAB471AE9)

Release notes (in progress) can be found at
https://accumulo.apache.org/release_notes/1.5.4

Please vote one of:
[ ] +1 - I have verified and accept...
[ ] +0 - I have reservations, but not strong enough to vote

against...

[ ] -1 - Because..., I do not accept...
... these artifacts as the 1.5.4 release of Apache Accumulo.

This vote will end on Thurs Sep 10 23:00:00 UTC 2015 (Thurs Sep
10 20:00:00 EDT 2015 / Thurs Sep 10 17:00:00 PDT 2015)

Thanks!

P.S. Hint: download the whole staging repo with wget
-erobots=off -r -l inf -np -nH \



https://repository.apache.org/content/repositories/orgapacheaccumulo-1
039/

# note the trailing slash is needed




--
Sean







Re: [VOTE] Accumulo 1.5.4-rc1

2015-09-10 Thread Josh Elser

Christopher wrote:

The larger concern I have is that expecting it to be fixed prior to 1.5.4
might mean loss of willingness to create an RC2 for 1.5.4 and release it at
all. Recall, the 1.5 branch was only revived at all to fix some critical
issues and move on. It's still a viable alternative to abandon 1.5.x and
focus on fixing these issues in 1.6, and later, where we've made dramatic
improvements to the build. Fixing these newly identified issues is going to
take some time and effort. It's not that it can't be done for 1.5.4... but
the question is... who is going to do it? Is anybody who issued a "-1"
willing to step up and resolve these issues? Or will it rely on Josh? I'm
currently looking at some of the issues to try to help out if I can, but I
also have other obligations.



Thanks for consideration, Christopher.

While people sharing their opinion is appreciated, a repeated reaction 
without offered effort put behind it can come across in bad taste. Not 
trying to do discourage the conversation I asked for, I just would love 
to come back and see a patch to _fix_ one of the many nits outlined 
instead of another "me too".


Re: [VOTE] Accumulo 1.5.4-rc1

2015-09-10 Thread Josh Elser
Well.. yeah. It is open source. I don't think you needed someone to tell 
you that though.


There are lots of issues in telling people "just build the code 
yourself", probably the biggest being a rather negative experience for 
the user. For example, I'd be frustrated if I wanted to use MySQL and 
had to build it by hand to get a _data loss bugfix_.


Going out on a limb and guessing that you're ultimately trying to say 
"it'll get released anyways", that doesn't address the fact that no one 
is actually volunteering to do the work.


dlmar...@comcast.net wrote:

If the critical issues are fixed in 1.5.x, and someone needs them, can't they 
check out the source and build it themselves? Is that a viable option?

- Original Message -

From: "Christopher"<ctubb...@apache.org>
To: dev@accumulo.apache.org
Sent: Thursday, September 10, 2015 3:44:20 PM
Subject: Re: [VOTE] Accumulo 1.5.4-rc1

The larger concern I have is that expecting it to be fixed prior to 1.5.4
might mean loss of willingness to create an RC2 for 1.5.4 and release it at
all. Recall, the 1.5 branch was only revived at all to fix some critical
issues and move on. It's still a viable alternative to abandon 1.5.x and
focus on fixing these issues in 1.6, and later, where we've made dramatic
improvements to the build. Fixing these newly identified issues is going to
take some time and effort. It's not that it can't be done for 1.5.4... but
the question is... who is going to do it? Is anybody who issued a "-1"
willing to step up and resolve these issues? Or will it rely on Josh? I'm
currently looking at some of the issues to try to help out if I can, but I
also have other obligations.

On Thu, Sep 10, 2015 at 3:33 PM Alex Moundalexis<al...@cloudera.com>  wrote:


-1 (non-binding)

Fix now and it'll be fixed here and in 1.6.x.

On Thu, Sep 10, 2015 at 2:46 AM, Sean Busbey<bus...@cloudera.com>  wrote:


-1

* signatures check out
* checksums match
* licensing errors noted in ACCUMULO-3988

On Sat, Sep 5, 2015 at 4:27 PM, Josh Elser<els...@apache.org>  wrote:


Accumulo Developers,

Please consider the following candidate for Accumulo 1.5.4.

Git Commit:
12a1041dcbb7f3b10543c305f27ece4b0d65ab9c
Branch:
1.5.4-rc1

If this vote passes, a gpg-signed tag will be created using:
git tag -f -m 'Apache Accumulo 1.5.4' -s 1.5.4
12a1041dcbb7f3b10543c305f27ece4b0d65ab9c

Staging repo:


https://repository.apache.org/content/repositories/orgapacheaccumulo-1039

Source (official release artifact):


https://repository.apache.org/content/repositories/orgapacheaccumulo-1039/org/apache/accumulo/accumulo/1.5.4/accumulo-1.5.4-src.tar.gz

Binary:


https://repository.apache.org/content/repositories/orgapacheaccumulo-1039/org/apache/accumulo/accumulo/1.5.4/accumulo-1.5.4-bin.tar.gz

(Append ".sha1", ".md5", or ".asc" to download the signature/hash for a
given artifact.)

All artifacts were built and staged with:
mvn release:prepare&&  mvn release:perform

Signing keys are available at

https://www.apache.org/dist/accumulo/KEYS

(Expected fingerprint: ABC8914C675FAD3FA74F39B2D146D62CAB471AE9)

Release notes (in progress) can be found at
https://accumulo.apache.org/release_notes/1.5.4

Please vote one of:
[ ] +1 - I have verified and accept...
[ ] +0 - I have reservations, but not strong enough to vote against...
[ ] -1 - Because..., I do not accept...
... these artifacts as the 1.5.4 release of Apache Accumulo.

This vote will end on Thurs Sep 10 23:00:00 UTC 2015
(Thurs Sep 10 20:00:00 EDT 2015 / Thurs Sep 10 17:00:00 PDT 2015)

Thanks!

P.S. Hint: download the whole staging repo with
wget -erobots=off -r -l inf -np -nH \



https://repository.apache.org/content/repositories/orgapacheaccumulo-1039/

# note the trailing slash is needed




--
Sean






Re: sync datacenter

2015-09-09 Thread Josh Elser
If you already have the data in one datacenter, ExportTable and 
ImportTable are the way to go.


http://accumulo.apache.org/1.7/examples/export.html

If you have new data to write to multiple locations, you might consider 
trying out the data center replication feature in 1.7.0.


http://accumulo.apache.org/1.7/accumulo_user_manual.html#_replication

z11373 wrote:

Hi,
Is there a best practice for replicating Accumulo data in 2 datacenters in
relatively fast and without incur downtime?
One option I can think of is to have the app writes to Accumulo in both
datacenters.

Thanks,
Z



--
View this message in context: 
http://apache-accumulo.1065345.n5.nabble.com/sync-datacenter-tp15087.html
Sent from the Developers mailing list archive at Nabble.com.


Re: sync datacenter

2015-09-09 Thread Josh Elser
The Import/Export route won't have any downtime on the "source" system. 
You can clone the source table, and use that to run the export. On the 
"destination" system, yes, you will only have the data since the last 
import.


One thing I didn't think about before is that I'm not sure you can 
import to a table that already exists. If you're doing this on a regular 
schedule, you would have to do some extra coordination. These snapshots 
are full-snapshosts. There is no incremental snapshot support.


As the source table grow, yes, copying the data from one system to the 
other (typically, using distcp) will take more and more time.


The above limitations are the base for what the replication feature aims 
to solve. Import/Export table, however, are much more simple and 
well-tested than replication.


z11373 wrote:

Thanks Josh for the links.

It seems to me if we're going with ImportTable, it'll incur downtime when
importing the data to target table?

Also, the table is growing as the time goes, so the whole export/import
table process may take longer time going forward, is that correct?


Thanks,
Z



--
View this message in context: 
http://apache-accumulo.1065345.n5.nabble.com/sync-datacenter-tp15087p15090.html
Sent from the Developers mailing list archive at Nabble.com.


Re: [VOTE] Accumulo 1.5.4-rc1

2015-09-08 Thread Josh Elser
Also, a heads-up since I had one question about this already: you 
(hopefully) will notice that this was signed using a different key than 
previously for me. This is expected.


I built this release on a virtual server (under my virtual but not 
physical control). As such, I did not feel comfortable placing my 
existing private key on this server and created a new one instead.


You'll want to recv-keys on my new key when verifying checksums: `gpg 
--keyserver pgp.mit.edu --recv-keys AB471AE9`


Josh Elser wrote:

Accumulo Developers,

Please consider the following candidate for Accumulo 1.5.4.

Git Commit:
12a1041dcbb7f3b10543c305f27ece4b0d65ab9c
Branch:
1.5.4-rc1

If this vote passes, a gpg-signed tag will be created using:
git tag -f -m 'Apache Accumulo 1.5.4' -s 1.5.4
12a1041dcbb7f3b10543c305f27ece4b0d65ab9c

Staging repo:
https://repository.apache.org/content/repositories/orgapacheaccumulo-1039
Source (official release artifact):
https://repository.apache.org/content/repositories/orgapacheaccumulo-1039/org/apache/accumulo/accumulo/1.5.4/accumulo-1.5.4-src.tar.gz

Binary:
https://repository.apache.org/content/repositories/orgapacheaccumulo-1039/org/apache/accumulo/accumulo/1.5.4/accumulo-1.5.4-bin.tar.gz

(Append ".sha1", ".md5", or ".asc" to download the signature/hash for a
given artifact.)

All artifacts were built and staged with:
mvn release:prepare && mvn release:perform

Signing keys are available at https://www.apache.org/dist/accumulo/KEYS
(Expected fingerprint: ABC8914C675FAD3FA74F39B2D146D62CAB471AE9)

Release notes (in progress) can be found at
https://accumulo.apache.org/release_notes/1.5.4

Please vote one of:
[ ] +1 - I have verified and accept...
[ ] +0 - I have reservations, but not strong enough to vote against...
[ ] -1 - Because..., I do not accept...
... these artifacts as the 1.5.4 release of Apache Accumulo.

This vote will end on Thurs Sep 10 23:00:00 UTC 2015
(Thurs Sep 10 20:00:00 EDT 2015 / Thurs Sep 10 17:00:00 PDT 2015)

Thanks!

P.S. Hint: download the whole staging repo with
wget -erobots=off -r -l inf -np -nH \

https://repository.apache.org/content/repositories/orgapacheaccumulo-1039/
# note the trailing slash is needed


[VOTE] Accumulo 1.5.4-rc1

2015-09-05 Thread Josh Elser

Accumulo Developers,

Please consider the following candidate for Accumulo 1.5.4.

Git Commit:
12a1041dcbb7f3b10543c305f27ece4b0d65ab9c
Branch:
1.5.4-rc1

If this vote passes, a gpg-signed tag will be created using:
git tag -f -m 'Apache Accumulo 1.5.4' -s 1.5.4 
12a1041dcbb7f3b10543c305f27ece4b0d65ab9c


Staging repo: 
https://repository.apache.org/content/repositories/orgapacheaccumulo-1039
Source (official release artifact): 
https://repository.apache.org/content/repositories/orgapacheaccumulo-1039/org/apache/accumulo/accumulo/1.5.4/accumulo-1.5.4-src.tar.gz
Binary: 
https://repository.apache.org/content/repositories/orgapacheaccumulo-1039/org/apache/accumulo/accumulo/1.5.4/accumulo-1.5.4-bin.tar.gz
(Append ".sha1", ".md5", or ".asc" to download the signature/hash for a 
given artifact.)


All artifacts were built and staged with:
mvn release:prepare && mvn release:perform

Signing keys are available at https://www.apache.org/dist/accumulo/KEYS
(Expected fingerprint: ABC8914C675FAD3FA74F39B2D146D62CAB471AE9)

Release notes (in progress) can be found at 
https://accumulo.apache.org/release_notes/1.5.4


Please vote one of:
[ ] +1 - I have verified and accept...
[ ] +0 - I have reservations, but not strong enough to vote against...
[ ] -1 - Because..., I do not accept...
... these artifacts as the 1.5.4 release of Apache Accumulo.

This vote will end on Thurs Sep  10 23:00:00 UTC 2015
(Thurs Sep  10 20:00:00 EDT 2015 / Thurs Sep  10 17:00:00 PDT 2015)

Thanks!

P.S. Hint: download the whole staging repo with
wget -erobots=off -r -l inf -np -nH \

https://repository.apache.org/content/repositories/orgapacheaccumulo-1039/
# note the trailing slash is needed


Re: exception thrown during minor compaction

2015-09-01 Thread Josh Elser

Keith Turner wrote:

>  Thanks Eric and Josh.
>
>  There shouldn't be delete marker because my code doesn't perform any delete
>  operation, right?
>
>  Josh: if that out-of-the-box SummingCombiner cannot handle delete marker,
>  then I'd think that's bug:-)
>


  https://issues.apache.org/jira/browse/ACCUMULO-2232



Aside: it cracks me up that I have no recollection of this conversation 
anymore, much less running into (what might be) the same bug :P


Re: Specify Range for data in specific column family

2015-09-01 Thread Josh Elser

That's very likely not ever going to happen.

Ranges are used for identifying portions of a table to scan. These 
ranges are identified by start and end keys. Thus, the granularity of 
Key itself defines what is valid in a Range (e.g. you can start at a 
row+cf and end at a row+cf+cq).


A second way to limit the data you receive is by requesting a limited 
collection of columns via the fetchColumns API methods. These methods 
are agnostic of the range of data being consumed, only concerned with 
the columns being fetched.


I don't think it makes any sense to try to mash them together :)

z11373 wrote:

Thanks Josh! I wish future version of Accumulo API would allow us to specify
only column family in the Range, so that AccumuloInputFormat.setRanges will
work for my case. This is low pri though, as it has fetchColumns as
alternative :-)


Thanks,
Z



--
View this message in context: 
http://apache-accumulo.1065345.n5.nabble.com/Specify-Range-for-data-in-specific-column-family-tp15012p15031.html
Sent from the Developers mailing list archive at Nabble.com.


Re: exception thrown during minor compaction

2015-09-01 Thread Josh Elser

z11373 wrote:

Thanks Eric and Josh.

There shouldn't be delete marker because my code doesn't perform any delete
operation, right?


Correct.


Josh: if that out-of-the-box SummingCombiner cannot handle delete marker,
then I'd think that's bug :-)


Yes, definitely. Sounds like that is not the case however. Will wait to 
hear back with what you find.




Thanks,
Z


Re: exception thrown during minor compaction

2015-08-31 Thread Josh Elser
An empty value seems to imply that you wrote some unexpected data to 
your table. The following code does work correctly (using 1.7.0).


byte[] bytes = new LongLexicoder().encode(1l);
System.out.println(Arrays.toString(bytes));
System.out.println(new LongLexicoder().decode(bytes));

Can you check the code you're using to create Mutations and verify that 
you're passing in the bytes from the LongLexicoder encode() method as 
the Value?


You can try removing the SummingCombiner from your table which should 
let you scan the table to verify the records (and also let your 
compactions happen). Because you removed the VersioningIterator, it 
should preserve all of the entries you have in the table.


z11373 wrote:

Hi,
I attach a summing combiner into a newly created table. The snippet code is
something like:

EnumSet  iteratorScopes =
EnumSet.allOf(IteratorScope.class);

// first, remove versioning iterator since it will not 
work with combiner
conn.tableOperations().removeIterator(tableName,
VERS_ITERATOR_NAME,
iteratorScopes);

// create the combiner setting, in this case it is 
SummingCombiner, which
will
// sum value of all rows with same key (different 
timestamp is considered
same)
// and result in single row with that key and aggregate 
value
IteratorSetting setting = new IteratorSetting(
COMBINERS_PRIORITY,
SUM_COMBINERS_NAME,
SummingCombiner.class);

// set the combiner to apply to all columns
SummingCombiner.setCombineAllColumns(setting, true);

// need to set encoding type, otherwise exception will 
be thrown during
scan
SummingCombiner.setEncodingType(setting, 
LongLexicoder.class);

// attach the combiner to the table
conn.tableOperations().attachIterator(tableName, 
setting,
iteratorScopes);


As you see from the code above, I use LongLexicoder class as the encoding
type.
The mutation I add for that table will be unique row id, a string for column
family, empty column qualifier, and the value is "new
LongLexicoder().encode(1L)", so basically the value is 1.

It runs fine until at one point (and I can see rows are inserted into the
table), but it hung then.
Looking at the tablet server logs I found:

2015-08-31 17:59:42,371 [tserver.MinorCompactor] WARN : MinC failed (0) to
create
hdfs://:/accumulo/tables/l/default_tablet/F9gp.rf_tmp
retrying ...
java.lang.ArrayIndexOutOfBoundsException: 0
 at
org.apache.accumulo.core.client.lexicoder.ULongLexicoder.decode(ULongLexicoder.java:60)
 at
org.apache.accumulo.core.client.lexicoder.LongLexicoder.decode(LongLexicoder.java:33)
 at
org.apache.accumulo.core.client.lexicoder.LongLexicoder.decode(LongLexicoder.java:25)
 at
org.apache.accumulo.core.iterators.TypedValueCombiner$VIterator.hasNext(TypedValueCombiner.java:82)
 at
org.apache.accumulo.core.iterators.user.SummingCombiner.typedReduce(SummingCombiner.java:31)
 at
org.apache.accumulo.core.iterators.user.SummingCombiner.typedReduce(SummingCombiner.java:27)
 at
org.apache.accumulo.core.iterators.TypedValueCombiner.reduce(TypedValueCombiner.java:182)
 at
org.apache.accumulo.core.iterators.Combiner.findTop(Combiner.java:166)
 at
org.apache.accumulo.core.iterators.Combiner.next(Combiner.java:147)
 at
org.apache.accumulo.tserver.Compactor.compactLocalityGroup(Compactor.java:505)
 at org.apache.accumulo.tserver.Compactor.call(Compactor.java:362)
 at
org.apache.accumulo.tserver.MinorCompactor.call(MinorCompactor.java:96)
 at org.apache.accumulo.tserver.Tablet.minorCompact(Tablet.java:2072)
 at org.apache.accumulo.tserver.Tablet.access$4400(Tablet.java:172)
 at
org.apache.accumulo.tserver.Tablet$MinorCompactionTask.run(Tablet.java:2159)
 at
org.apache.accumulo.core.util.LoggingRunnable.run(LoggingRunnable.java:34)
 at
org.apache.accumulo.trace.instrument.TraceRunnable.run(TraceRunnable.java:47)
 at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
 at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
 at
org.apache.accumulo.trace.instrument.TraceRunnable.run(TraceRunnable.java:47)
 at
org.apache.accumulo.core.util.LoggingRunnable.run(LoggingRunnable.java:34)
 at java.lang.Thread.run(Thread.java:722)


Looking at Accumulo's source code in ULongLexicoder.java, it looks like the
array is empty, hence it 

Re: exception thrown during minor compaction

2015-08-31 Thread Josh Elser
Well that's curious. Certainly looks like it's doing the right thing. I 
wonder if there's an edge case with the Lexicoder that's causing it to 
write bad data...


Pretty much the only thing I can think of that wouldn't require code is 
the `grep` shell command. I believe the implementation will also inspect 
the value. Everything else, like you point out, is operating on the keys.


The code-writing option is to write your own Filter implementation that 
only accepts empty values, but that may be more work than just dumping 
the contents of the table and using some `grep` magic in the shell. I'll 
let you decide which is more work :)


z11373 wrote:

Thanks Josh for the quick reply!
Yes, my code is in one place, which always insert 1L as value.

LongEncoder encoder = new LongEncoder();
Value countValue = new Value(encoder.encode(1L));
Mutation m = new Mutation(key);
m.put(name, new Text(), countValue);

It works fine until at one point of the ingestion (there are millions of
data).
Looking at the code, I can't think what would cause it insert empty or null
value since it's explicitly hardcoded with 1L
Is there a way to scan that table and find the key with value is empty?
I know that we can scan by key but not by value, so I guess this is not
possible, unless I go thru all of the rows.

Thanks,
Z



--
View this message in context: 
http://apache-accumulo.1065345.n5.nabble.com/exception-thrown-during-minor-compaction-tp15010p15013.html
Sent from the Developers mailing list archive at Nabble.com.


Re: Specify Range for data in specific column family

2015-08-31 Thread Josh Elser
Use `[batch]scanner.fetchColumnFamily(Text)` to specify the column 
family. All rows would just be specified by `Collections.singleton(new 
Range())`.


- Josh

z11373 wrote:

Hi,
AccumuloInputFormat has setRanges method which takes collection of Range.
I want it to process all rows with specific column family name.
However, I couldn't find a way to do so from Range.exact or Range.prefix,
all of them require to specify row id. Do you know if there is a way to
achieve what I am looking for? I haven't tried specifying empty Text for the
row, the doc also doesn't say about it.

Thanks,
zainal



--
View this message in context: 
http://apache-accumulo.1065345.n5.nabble.com/Specify-Range-for-data-in-specific-column-family-tp15012.html
Sent from the Developers mailing list archive at Nabble.com.


Re: exception thrown during minor compaction

2015-08-31 Thread Josh Elser

Ah, ok. Thanks for clarifying.

If that is indeed the cause here, it's a bug in our provided Combiners. 
Z is just using the "user" iterators that we provide and invoking the 
API. Based on provided usage earlier, he's doing the right thing.


If you can verify, Z, that'd be really helpful!

Eric Newton wrote:

Sure, but not during a compaction that does not involve all the underlying
files. The delete keys must be propagated.

I'm not completely familiar with the underlying libraries that help you
write iterators, I just know it's a common mistake.


On Mon, Aug 31, 2015 at 11:53 PM, Josh Elser<josh.el...@gmail.com>  wrote:


Shouldn't the delete be masked at a lower layer (DeletingIterator)? Or am
I forgetting that Combiners see that value somehow (and maybe
SummingCombiner is broken)?


Eric Newton wrote:


You may be seeing a delete marker.  In a compaction, you will see delete
markers, which have an empty value. You will have to check the delete flag
on the key before grabbing the Value.

-Eric

On Mon, Aug 31, 2015 at 8:28 PM, z11373<z11...@outlook.com>   wrote:

Thanks Josh! I am going to do more experiments, because this is really

weird.
I'll post the update if there are any interesting stuff I found out
later.

Thanks,
Z



--
View this message in context:

http://apache-accumulo.1065345.n5.nabble.com/exception-thrown-during-minor-compaction-tp15010p15017.html
Sent from the Developers mailing list archive at Nabble.com.






<    1   2   3   4   5   6   7   8   9   10   >