[jira] [Commented] (ACCUMULO-4739) Make 3rd party web resources (js, css) location configurable
[ https://issues.apache.org/jira/browse/ACCUMULO-4739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16252651#comment-16252651 ] Christopher Tubbs commented on ACCUMULO-4739: - [~lstav] sent me the following example code for how we might be able to do this. I'm not sure if it will work as is, or if we'd want to do something different. But, it does assume that our configuration represent some sort of list-like structure (vs. a text blob to drop in to the header, which might be easier). He says: {code} On the Java side you would need something like this: List
[jira] [Created] (ACCUMULO-4739) Make 3rd party web resources (js, css) location configurable
Christopher Tubbs created ACCUMULO-4739: --- Summary: Make 3rd party web resources (js, css) location configurable Key: ACCUMULO-4739 URL: https://issues.apache.org/jira/browse/ACCUMULO-4739 Project: Accumulo Issue Type: Task Components: monitor Reporter: Christopher Tubbs Assignee: Michael Miller Priority: Blocker Fix For: 2.0.0 Currently, in the new monitor for 2.0 (after ACCUMULO-3005), some 3rd party web resources are accessed via an external CDN. This is suitable in many cases, but could be problematic for client browsers not currently connected to the internet or with a cached copy of the resources from the CDN. These resources include bootstrap and jquery. Flot is also a 3rd party resource, but is currently bundled with Accumulo and served by the monitor. The location of these resources should be made configurable, so that they can be bundled with, and served by, the Accumulo monitor instead of a internet-based CDN. Making the locations configurable also makes it possible for users to update, if there's a bug in a particular version of jquery that the administrator wishes to avoid, or they want to use a different bootstrap theme, for example. Any new configuration option added to support making these configurable should be capable of supporting an arbitrary number of script and stylesheet resources, and possibly other resource types, as well as any accompanying integrity/crossorigin attributes for CDN access (see server/monitor/src/main/resources/templates/default.ftl for current values). Also, I think the default value should be to point to the CDN, and not the locally bundled and served resources, so that the browser can take advantage of any caching for these commonly used resources. This would allow us to achieve ACCUMULO-2983 by stopping bundling these third party resources, but still supporting bundling, if needed. To complete this issue, we basically need 2 things: # Ensure monitor serves (to a predictable location) whatever arbitrary static resources it finds on the class path (so users can bundle their own static resources), and # Ensure resources are configurable to point to the served versions or versions in a CDN. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
Accumulo-Master - Build # 2179 - Still Failing
The Apache Jenkins build system has built Accumulo-Master (build #2179) Status: Still Failing Check console output at https://builds.apache.org/job/Accumulo-Master/2179/ to view the results.
[jira] [Updated] (ACCUMULO-4714) Create landing page for new developers
[ https://issues.apache.org/jira/browse/ACCUMULO-4714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ACCUMULO-4714: - Labels: pull-request-available (was: ) > Create landing page for new developers > -- > > Key: ACCUMULO-4714 > URL: https://issues.apache.org/jira/browse/ACCUMULO-4714 > Project: Accumulo > Issue Type: Improvement > Components: website >Reporter: Michael Miller >Assignee: Mark Owens > Labels: pull-request-available > > The website has a lot of good information for contributing to Accumulo but it > is scattered across multiple pages. There is no clear, concise page that can > be sent as a link to developers interested in committing to the project. I > feel like this is a turn off for someone who is interested in contributing to > Accumulo. > This page would be a good place but it is just a bunch of links: > https://accumulo.apache.org/contributor/ > As a recent newcomer I would tend to go here: > https://accumulo.apache.org/contributor/source > But this page is confusing. The first instructions you get (after more > links) explain how to build the website. Then when you get to the developers > guide the the very first thing is a paragraph about activating the Thrift > profile. While this information is all very useful, the first 2 scenarios > are edge cases of development and it does not ease a new developer into > writing code for Accumulo. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[GitHub] ctubbsii commented on issue #37: ACCUMULO-4714 Create landing page for new developers
ctubbsii commented on issue #37: ACCUMULO-4714 Create landing page for new developers URL: https://github.com/apache/accumulo-website/pull/37#issuecomment-344416089 Don't forget to use the correct syntax when referencing a JIRA issue number. The title of this GitHub issue was "Accumulo 4714". I corrected it to be "ACCUMULO-4714" so it would properly link to the JIRA. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Resolved] (ACCUMULO-4730) Create an Entry length summarizer
[ https://issues.apache.org/jira/browse/ACCUMULO-4730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Keith Turner resolved ACCUMULO-4730. Resolution: Fixed > Create an Entry length summarizer > - > > Key: ACCUMULO-4730 > URL: https://issues.apache.org/jira/browse/ACCUMULO-4730 > Project: Accumulo > Issue Type: Improvement >Reporter: Keith Turner >Assignee: Jared R > Labels: newbie, pull-request-available > Fix For: 2.0.0 > > Time Spent: 3h > Remaining Estimate: 0h > > It would be very useful to have a built in > [Summarizer|https://github.com/apache/accumulo/blob/master/core/src/main/java/org/apache/accumulo/core/client/summary/Summarizer.java] > that computes summary information about field lengths. Specifically key > length, row length, family length, qualifier length, visibility length, and > value length. Whatever stats are computed must be able to computed > incrementally. For example can incrementally compute min, max, count, sum, > and log2 histogram. I think these would be good stats to start with. Count > and sum can be used to compute the average. There is an example of computing > a log2 histogram in the Summarizer javadoc. > The Summarizer could be named EntryLenghtSummarizer and possibly produce > summaries like the following. > {noformat} > count=XXX //do not need to track this per field, its the same for all > key.min=XXX > key.max=XXX > key.sum=XXX > key.logHist.8=XXX //only output non zero exponents > key.logHist.9=XXX > row.min=XXX > row.max=XXX > row.sum=XXX > row.logHist.7=XXX > row.logHist.8=XXX > row.logHist.10=XXX > family.min=XXX > family.max=XXX > family.sum=XXX > family.logHist.6=XXX > family.logHist.7=XXX > etc... > {noformat} > This new summarizer would be placed in the > [summarizers|https://github.com/apache/accumulo/tree/master/core/src/main/java/org/apache/accumulo/core/client/summary/summarizers] > package. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[GitHub] keith-turner commented on issue #320: ACCUMULO-4730 Created EntryLengthSummarizer
keith-turner commented on issue #320: ACCUMULO-4730 Created EntryLengthSummarizer URL: https://github.com/apache/accumulo/pull/320#issuecomment-344413885 I squashed these commits into 9cd4be0432c7b0297d86b19ddeac64ed0feaea87 and pushed that to master. I ran `mvn verify -DskipITs` and it did some slight reformatting which is included in 9cd4be0432c7b0297d86b19ddeac64ed0feaea87 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] keith-turner closed pull request #320: ACCUMULO-4730 Created EntryLengthSummarizer
keith-turner closed pull request #320: ACCUMULO-4730 Created EntryLengthSummarizer URL: https://github.com/apache/accumulo/pull/320 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/core/src/main/java/org/apache/accumulo/core/client/summary/summarizers/EntryLengthSummarizer.java b/core/src/main/java/org/apache/accumulo/core/client/summary/summarizers/EntryLengthSummarizer.java new file mode 100644 index 00..e7a29161e4 --- /dev/null +++ b/core/src/main/java/org/apache/accumulo/core/client/summary/summarizers/EntryLengthSummarizer.java @@ -0,0 +1,148 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License");you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.accumulo.core.client.summary.summarizers; + +import java.math.RoundingMode; +import java.util.Map; +import java.util.function.BiFunction; + +import org.apache.accumulo.core.client.summary.Summarizer; +import org.apache.accumulo.core.client.summary.SummarizerConfiguration; +import org.apache.accumulo.core.data.Key; +import org.apache.accumulo.core.data.Value; + +import com.google.common.math.IntMath; + +/** + * Summarizer that computes summary information about field lengths. + * Specifically key length, row length, family length, qualifier length, visibility length, and value length. + * Incrementally computes minimum, maximum, count, sum, and log2 histogram of the lengths. + * + * @since 2.0.0 + */ +public class EntryLengthSummarizer implements Summarizer { + + /* Helper function that calculates the various statistics that is used for the Collector methods.*/ + private static class LengthStats { +private long min = Long.MAX_VALUE; +private long max = Long.MIN_VALUE; +private long sum = 0; +private long[] counts = new long[32]; + +private void accept(int length) { + int idx; + + if (length < min) { +min = length; + } + + if (length > max) { +max = length; + } + + sum += length; + + if (length == 0) { +idx = 0; + } else { +idx = IntMath.log2(length, RoundingMode.HALF_UP); + } + + counts[idx]++; +} + +void summarize (String prefix, StatisticConsumer sc) { + sc.accept(prefix+".min", (min != Long.MAX_VALUE ? min:0)); + sc.accept(prefix+".max", (max != Long.MIN_VALUE ? max:0)); + sc.accept(prefix+".sum", sum); + + for (int i = 0; i < counts.length; i++) { +if (counts[i] > 0) { + sc.accept(prefix+".logHist."+i, counts[i]); +} + } +} + + } + + /* Helper functions for merging that is used by the Combiner. */ + private static void merge(String key, BiFunctionmergeFunc, Map stats1, Map stats2) { +Long mergeVal = stats2.get(key); + +if(mergeVal != null) { + stats1.merge(key, mergeVal, mergeFunc); +} + } + + private static void merge(String prefix, Map stats1, Map stats2) { +merge(prefix+".min", Long::min, stats1, stats2); +merge(prefix+".max", Long::max, stats1, stats2); +merge(prefix+".sum", Long::sum, stats1, stats2); +for (int i = 0; i < 32; i++) { + merge(prefix+".logHist."+i, Long::sum, stats1, stats2); +} + } + + @Override + public Collector collector(SummarizerConfiguration sc) { +return new Collector() { + + private LengthStats keyStats = new LengthStats(); + private LengthStats rowStats = new LengthStats(); + private LengthStats familyStats = new LengthStats(); + private LengthStats qualifierStats = new LengthStats(); + private LengthStats visibilityStats = new LengthStats(); + private LengthStats valueStats = new LengthStats(); + private long total = 0; + + @Override + public void accept(Key k, Value v) { + keyStats.accept(k.getLength()); + rowStats.accept(k.getRowData().length()); + familyStats.accept(k.getColumnFamilyData().length()); + qualifierStats.accept(k.getColumnQualifierData().length()); +
[jira] [Updated] (ACCUMULO-1242) Consistent logging
[ https://issues.apache.org/jira/browse/ACCUMULO-1242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Christopher Tubbs updated ACCUMULO-1242: Fix Version/s: (was: 1.8.2) > Consistent logging > -- > > Key: ACCUMULO-1242 > URL: https://issues.apache.org/jira/browse/ACCUMULO-1242 > Project: Accumulo > Issue Type: Bug > Components: build >Reporter: Christopher Tubbs >Assignee: Ed Coleman > Labels: log4j, logging, logs, slf4j > Fix For: 2.0.0 > > Attachments: ACCUMULO-1242-2.patch, accumulo-slf4j-snapshot.patch, > dynamicLog.tgz > > > Logging dependencies are very inconsistent. It seems we have absolute > dependencies on log4j, yet use slf4j sometimes, and log4j other times. In > some of our tests we have slf4j-nop as a test dependency. > It seems we could consolidate a lot of this if we simply did: > # slf4j-api : compile > # slf4j-log4j12 : runtime > # slf4j-nop : test > # log4j : runtime > We could do this in the parent POM and get rid of all the different > dependencies throughout the code. > I don't know that we could ever use anything other than slf4j-log4j12 as the > implementation (unless our dependencies broke away from using log4j directly > also), but at least we'd clean up all the logging dependencies in our > code/build, and would be ready to switch to something better if something > came along. Further, if somebody wanted to reuse our code, and weren't tied > to log4j, because they didn't need our transitive dependencies that locked in > log4j, they could easily depend on their own slf4j implementation jar, and > all the logging in our code would still work correctly for them without > needing to use something like log4j-over-slf4j. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Resolved] (ACCUMULO-2349) Accumulo should give some notification if there is no *_logger.xml file
[ https://issues.apache.org/jira/browse/ACCUMULO-2349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Christopher Tubbs resolved ACCUMULO-2349. - Resolution: Won't Fix Fix Version/s: (was: 2.0.0) Closing as "Won't Fix". This is OBE as of 2.0 script improvements, which separated logger config from code. Prerequisites for logging config can now be addressed more easily in user configuration in accumulo-env.sh without us doing anything specific for it. > Accumulo should give some notification if there is no *_logger.xml file > --- > > Key: ACCUMULO-2349 > URL: https://issues.apache.org/jira/browse/ACCUMULO-2349 > Project: Accumulo > Issue Type: Improvement >Reporter: John Vines >Priority: Critical > > Right now if you're missing general_logger and the appropriate > service_logger, it seems accumulo just happily carries on with no logs > whatsoever. We should alter it to complain loudly if it's missing them. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (ACCUMULO-780) Accumulo should have a configurator
[ https://issues.apache.org/jira/browse/ACCUMULO-780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Christopher Tubbs updated ACCUMULO-780: --- Fix Version/s: (was: 2.0.0) > Accumulo should have a configurator > --- > > Key: ACCUMULO-780 > URL: https://issues.apache.org/jira/browse/ACCUMULO-780 > Project: Accumulo > Issue Type: New Feature > Components: scripts >Reporter: John Vines > Attachments: ACCUMULO-780.v1.patch > > > We currently have a few different footprints available for users. We should > use those as a base and allow users to quickly and conveniently configure > their system for whatever size footprint they want. We can use the current as > a baseline and extrapolate/interpolate for whichever setup they're using. > This way we don't have to worry about maintaining a bunch of different sizes > and instead just have an algorithm that needs occasional loving. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ACCUMULO-1824) How to Make a Good Bug Report
[ https://issues.apache.org/jira/browse/ACCUMULO-1824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1625#comment-1625 ] Christopher Tubbs commented on ACCUMULO-1824: - Looking at old issues and noticing that this could be better addressed if we used the JIRA servicedesk features, or if we used GitHub for issues and used issue templates. > How to Make a Good Bug Report > - > > Key: ACCUMULO-1824 > URL: https://issues.apache.org/jira/browse/ACCUMULO-1824 > Project: Accumulo > Issue Type: Wish > Components: docs > Environment: meatspace >Reporter: Eric Newton >Priority: Critical > Labels: Documentation > > h3. Details are incredibly important when making a bug report. > * Provide a reproducible test case: > ** unit test, or integration test preferred > ** step-by-step instructions > * Provide evidence: > ** logs > ** transcripts of following step-by-step instructions > * What version of accumulo are you using? >** tagged version: provide the tag, ie. 1.4.2 >** branch: provide the hash from the last pull > * What versions of dependencies are you using: > ** zookeeper > ** hadoop > ** jvm > ** operation system > * How many cores, how much memory, how many disks? > * Is this a virtual machine? Is it running on a commercial cloud? > * What have you changed in the configuration? > * Have you reverted local code changes and re-tested? > * Post every detail you can think of: > ** config files > ** test runs > ** cluster size > ** hdfs disk usage at the time > ** load average > ** phase of the moon (kidding!) > *** except if something happens periodically ... include the period of time > *** with specific measurements: "see attached graph" > ** jstack stuck processes > ** provide relevant OS settings > *** "I have set swappiness to zero as suggested in the README" > *** "No swapping was observed, here's the output of free..." > *** "The file descriptor limit was set to 1024, but this is a single node > instance." > h3. Things to avoid: > * lack of specific messages from logs, output, etc. > ** "I think it said something about tablet not from mars... something like > that" > ** "it doesn't build... maybe missing some jar file" > * frustration: "I updated to version 1.1 AND NOW NOTHING WORKS!?" > * generalizations without details: > ** in 1.2, I could create tables faster than 1.3 > ** servers seem to fail more often than before > *** before what? > *** what else changed? > *** how can I reproduce your problem? > h3. Cooperate > Please provide the details if asked: > * "Can you jstack the server next time and post it?" > * Can you send your zookeepers the "stat" command and send the output? > * When this happens, are you running a map-reduce job? > * If you send me all your tablet server logs, I can probably figure out what > happened > * Are you seeing any errors on the monitor? > * What version of accumulo are you running? -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (ACCUMULO-3389) Iterator Names can't contain dots
[ https://issues.apache.org/jira/browse/ACCUMULO-3389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Christopher Tubbs updated ACCUMULO-3389: Fix Version/s: 2.0.0 1.8.2 1.7.4 > Iterator Names can't contain dots > - > > Key: ACCUMULO-3389 > URL: https://issues.apache.org/jira/browse/ACCUMULO-3389 > Project: Accumulo > Issue Type: Bug >Affects Versions: 1.6.0 >Reporter: John Vines > Fix For: 1.7.4, 1.8.2, 2.0.0 > > > Attempting to attach an interator who's name includes dots results in > messages on the remote end from IteratorUtil - "Unrecognizable option: > ITERNAME". No errors no warnings, when they are attempted to be attached. > They get added to the config without issue. > But when you do something like listiter, they don't show up and then warnings > appear in the logs/on the monitor. > We should either: > A. allow iterators with dots in the name > B. doc that this isn't allowed and check server side when they are set. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Resolved] (ACCUMULO-1589) Separate Controller and View logic in Monitor
[ https://issues.apache.org/jira/browse/ACCUMULO-1589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Christopher Tubbs resolved ACCUMULO-1589. - Resolution: Fixed Fix Version/s: 2.0.0 I think that the work on ACCUMULO-3005 and related suite of issues satisfies this. > Separate Controller and View logic in Monitor > - > > Key: ACCUMULO-1589 > URL: https://issues.apache.org/jira/browse/ACCUMULO-1589 > Project: Accumulo > Issue Type: Bug >Affects Versions: 1.5.0 >Reporter: Jim Klucar >Priority: Minor > Fix For: 2.0.0 > > > The view code is all just done with StringBuilders which make the HTML near > impossible to decipher in the .java files. The controller and views should > be broken apart with some framework. Many are around, .jsp would be better at > this point. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (ACCUMULO-4677) Sanitize @PathParam and @QueryParam parameters in new REST-based monitor
[ https://issues.apache.org/jira/browse/ACCUMULO-4677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ACCUMULO-4677: - Labels: pull-request-available (was: ) > Sanitize @PathParam and @QueryParam parameters in new REST-based monitor > > > Key: ACCUMULO-4677 > URL: https://issues.apache.org/jira/browse/ACCUMULO-4677 > Project: Accumulo > Issue Type: Bug > Components: monitor >Reporter: Christopher Tubbs >Assignee: Kyle Van Gilson >Priority: Blocker > Labels: pull-request-available > Fix For: 2.0.0 > > Time Spent: 3h 10m > Remaining Estimate: 0h > > Following on the issue identified in ACCUMULO-4660, I verified that > parameters to the REST-based monitor (ACCUMULO-3005) resources need > sanitization as well. > All {{@PathParam}} and {{@QueryParam}} annotated fields should be sanitized. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Resolved] (ACCUMULO-1325) Package monitor as a WAR
[ https://issues.apache.org/jira/browse/ACCUMULO-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Christopher Tubbs resolved ACCUMULO-1325. - Resolution: Won't Fix Monitor was already made a separate package in ACCUMULO-210, and I no longer think making it a WAR is going to add any value. > Package monitor as a WAR > > > Key: ACCUMULO-1325 > URL: https://issues.apache.org/jira/browse/ACCUMULO-1325 > Project: Accumulo > Issue Type: Improvement > Components: monitor >Reporter: Christopher Tubbs > Labels: gsoc2013 > > I've played around with building a Servlet 3.0 war, and have some ideas for > packaging the monitor as a war. > At the very least, it should probably be a separate package. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[GitHub] ctubbsii commented on issue #289: ACCUMULO-4677 Sanitizing PathParam values in REST-based Monitor
ctubbsii commented on issue #289: ACCUMULO-4677 Sanitizing PathParam values in REST-based Monitor URL: https://github.com/apache/accumulo/pull/289#issuecomment-344403538 What's the status of this? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Resolved] (ACCUMULO-238) support pluggable authorization providers for the monitor
[ https://issues.apache.org/jira/browse/ACCUMULO-238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Christopher Tubbs resolved ACCUMULO-238. Resolution: Won't Fix Closing this as "Won't Fix", since I think this is basically OBE, due to ACCUMULO-4617 > support pluggable authorization providers for the monitor > - > > Key: ACCUMULO-238 > URL: https://issues.apache.org/jira/browse/ACCUMULO-238 > Project: Accumulo > Issue Type: New Feature > Components: monitor >Reporter: Adam Fuchs >Assignee: Adam Fuchs > Attachments: authority_provider.patch > > > ACCUMULO-196 discusses adding more command and control functionality to the > monitor page. To properly secure these features in our customers' > environments, we will need to be able to interface with different > authorization providers. Enter pluggable authorization provider interface, > stage left. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[GitHub] keith-turner commented on a change in pull request #38: Continue tour
keith-turner commented on a change in pull request #38: Continue tour URL: https://github.com/apache/accumulo-website/pull/38#discussion_r150921482 ## File path: tour/data-model-code.md ## @@ -0,0 +1,65 @@ +--- +title: Data Model Code +--- + +```java +// Connect to Mini Accumulo as the root user and create a table called "GothamPD". +Connector conn = mac.getConnector("root", "tourguide"); +conn.tableOperations().create("GothamPD"); + +// Create 3 Mutation objects to hold each person of interest. +Mutation mutation1 = new Mutation("id0001"); +Mutation mutation2 = new Mutation("id0002"); +Mutation mutation3 = new Mutation("id0003"); + +// Create key/value pairs for each Mutation, putting them in the appropriate family. +mutation1.put("hero","alias", "Batman"); +mutation1.put("hero","name", "Bruce Wayne"); +mutation1.put("hero","wearsCape?", "true"); +mutation2.put("hero","alias", "Robin"); +mutation2.put("hero","name", "Dick Grayson"); +mutation2.put("hero","wearsCape?", "true"); +mutation3.put("villain","alias", "Joker"); +mutation3.put("villain","name", "Unknown"); +mutation3.put("villain","wearsCape?", "false"); + +// Create a BatchWriter to the GothamPD table and add your mutations to it. Try w/ resources will close for us. Review comment: Could mention that after close all data written is visible to scans. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] keith-turner commented on a change in pull request #38: Continue tour
keith-turner commented on a change in pull request #38: Continue tour URL: https://github.com/apache/accumulo-website/pull/38#discussion_r150962366 ## File path: tour/authorizations.md ## @@ -0,0 +1,42 @@ +--- +title: Authorizations +--- +An Authorization is an authority granted to a User that allows them to read certain data. A column in Accumulo can have +a visibility attached to it, limiting access to the data of that column. Only users who have an authorization equal to +the visibility will be able to read the data in that column. Review comment: One is a set and one is an expression. Could say a column visibility is a boolean expression that is evaluated using a scanners authorizations. If it evaluates to true, then the data is visible. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] milleruntime commented on a change in pull request #38: Continue tour
milleruntime commented on a change in pull request #38: Continue tour URL: https://github.com/apache/accumulo-website/pull/38#discussion_r150953947 ## File path: tour/authorizations.md ## @@ -0,0 +1,42 @@ +--- +title: Authorizations +--- +An Authorization is an authority granted to a User that allows them to read certain data. A column in Accumulo can have +a visibility attached to it, limiting access to the data of that column. Only users who have an authorization equal to +the visibility will be able to read the data in that column. Review comment: That does sound better but I was trying to make it clear that authorization = visibility. This was something that tripped me up as a beginner, they have different names for different reasons but it isn't clear anywhere that it is a simple string compare. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] keith-turner commented on a change in pull request #320: ACCUMULO-4730 Created EntryLengthSummarizer
keith-turner commented on a change in pull request #320: ACCUMULO-4730 Created EntryLengthSummarizer URL: https://github.com/apache/accumulo/pull/320#discussion_r150943664 ## File path: core/src/test/java/org/apache/accumulo/core/client/summary/summarizers/EntryLengthSummarizersTest.java ## @@ -0,0 +1,1171 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License");you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.accumulo.core.client.summary.summarizers; + +import java.util.HashMap; + +import org.apache.accumulo.core.client.summary.summarizers.EntryLengthSummarizer; +import org.apache.accumulo.core.client.summary.Summarizer.Collector; +import org.apache.accumulo.core.client.summary.Summarizer.Combiner; +import org.apache.accumulo.core.client.summary.SummarizerConfiguration; +import org.apache.accumulo.core.data.Key; +import org.apache.accumulo.core.data.Value; +import org.junit.Assert; +import org.junit.Test; + +public class EntryLengthSummarizersTest { + + /* COLLECTOR TEST */ + /* Basic Test: Each test adds to the next, all are simple lengths. */ + + @Test + public void testEmpty() { +SummarizerConfiguration sc = SummarizerConfiguration.builder(EntryLengthSummarizer.class).build(); +EntryLengthSummarizer entrySum = new EntryLengthSummarizer(); + +Collector collector = entrySum.collector(sc); + +HashMapstats = new HashMap<>(); +collector.summarize(stats::put); + +HashMap expected = new HashMap<>(); +expected.put("key.min", 0L); +expected.put("key.max", 0L); +expected.put("key.sum", 0L); + +expected.put("row.min", 0L); +expected.put("row.max", 0L); +expected.put("row.sum", 0L); + +expected.put("family.min", 0L); +expected.put("family.max", 0L); +expected.put("family.sum", 0L); + +expected.put("qualifier.min", 0L); +expected.put("qualifier.max", 0L); +expected.put("qualifier.sum", 0L); + +expected.put("visibility.min", 0L); +expected.put("visibility.max", 0L); +expected.put("visibility.sum", 0L); + +expected.put("value.min", 0L); +expected.put("value.max", 0L); +expected.put("value.sum", 0L); + +expected.put("total", 0L); + +Assert.assertEquals(expected, stats); + } + + @Test + public void testBasicRow() { +SummarizerConfiguration sc = SummarizerConfiguration.builder(EntryLengthSummarizer.class).build(); +EntryLengthSummarizer entrySum = new EntryLengthSummarizer(); + +Key k1 = new Key("r1"); +Key k2 = new Key("r2"); +Key k3 = new Key("r3"); + +Collector collector = entrySum.collector(sc); +collector.accept(k1, new Value("")); +collector.accept(k2, new Value("")); +collector.accept(k3, new Value("")); + +HashMap stats = new HashMap<>(); +collector.summarize(stats::put); + +HashMap expected = new HashMap<>(); +expected.put("key.min", 2L); +expected.put("key.max", 2L); +expected.put("key.sum", 6L); + +// Log2 Histogram +expected.put("key.logHist.1", 3L); + +expected.put("row.min", 2L); +expected.put("row.max", 2L); +expected.put("row.sum", 6L); + +// Log2 Histogram +expected.put("row.logHist.1", 3L); + +expected.put("family.min", 0L); +expected.put("family.max", 0L); +expected.put("family.sum", 0L); + +// Log2 Histogram +expected.put("family.logHist.0", 3L); + +expected.put("qualifier.min", 0L); +expected.put("qualifier.max", 0L); +expected.put("qualifier.sum", 0L); + +// Log2 Histogram +expected.put("qualifier.logHist.0", 3L); + +expected.put("visibility.min", 0L); +expected.put("visibility.max", 0L); +expected.put("visibility.sum", 0L); + +// Log2 Histogram +expected.put("visibility.logHist.0", 3L); + +expected.put("value.min", 0L); +expected.put("value.max", 0L); +expected.put("value.sum", 0L); + +// Log2 Histogram +expected.put("value.logHist.0", 3L); + +expected.put("total", 3L); + +Assert.assertEquals(expected, stats); + } + + @Test + public void testBasicFamily() { +SummarizerConfiguration sc = SummarizerConfiguration.builder(EntryLengthSummarizer.class).build(); +EntryLengthSummarizer entrySum = new
[GitHub] keith-turner commented on a change in pull request #320: ACCUMULO-4730 Created EntryLengthSummarizer
keith-turner commented on a change in pull request #320: ACCUMULO-4730 Created EntryLengthSummarizer URL: https://github.com/apache/accumulo/pull/320#discussion_r150943789 ## File path: core/src/test/java/org/apache/accumulo/core/client/summary/summarizers/EntryLengthSummarizersTest.java ## @@ -0,0 +1,1171 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License");you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.accumulo.core.client.summary.summarizers; + +import java.util.HashMap; + +import org.apache.accumulo.core.client.summary.summarizers.EntryLengthSummarizer; +import org.apache.accumulo.core.client.summary.Summarizer.Collector; +import org.apache.accumulo.core.client.summary.Summarizer.Combiner; +import org.apache.accumulo.core.client.summary.SummarizerConfiguration; +import org.apache.accumulo.core.data.Key; +import org.apache.accumulo.core.data.Value; +import org.junit.Assert; +import org.junit.Test; + +public class EntryLengthSummarizersTest { + + /* COLLECTOR TEST */ + /* Basic Test: Each test adds to the next, all are simple lengths. */ + + @Test + public void testEmpty() { +SummarizerConfiguration sc = SummarizerConfiguration.builder(EntryLengthSummarizer.class).build(); +EntryLengthSummarizer entrySum = new EntryLengthSummarizer(); + +Collector collector = entrySum.collector(sc); + +HashMapstats = new HashMap<>(); +collector.summarize(stats::put); + +HashMap expected = new HashMap<>(); +expected.put("key.min", 0L); +expected.put("key.max", 0L); +expected.put("key.sum", 0L); + +expected.put("row.min", 0L); +expected.put("row.max", 0L); +expected.put("row.sum", 0L); + +expected.put("family.min", 0L); +expected.put("family.max", 0L); +expected.put("family.sum", 0L); + +expected.put("qualifier.min", 0L); +expected.put("qualifier.max", 0L); +expected.put("qualifier.sum", 0L); + +expected.put("visibility.min", 0L); +expected.put("visibility.max", 0L); +expected.put("visibility.sum", 0L); + +expected.put("value.min", 0L); +expected.put("value.max", 0L); +expected.put("value.sum", 0L); + +expected.put("total", 0L); + +Assert.assertEquals(expected, stats); + } + + @Test + public void testBasicRow() { +SummarizerConfiguration sc = SummarizerConfiguration.builder(EntryLengthSummarizer.class).build(); +EntryLengthSummarizer entrySum = new EntryLengthSummarizer(); + +Key k1 = new Key("r1"); +Key k2 = new Key("r2"); +Key k3 = new Key("r3"); + +Collector collector = entrySum.collector(sc); +collector.accept(k1, new Value("")); +collector.accept(k2, new Value("")); +collector.accept(k3, new Value("")); + +HashMap stats = new HashMap<>(); +collector.summarize(stats::put); + +HashMap expected = new HashMap<>(); +expected.put("key.min", 2L); +expected.put("key.max", 2L); +expected.put("key.sum", 6L); + +// Log2 Histogram +expected.put("key.logHist.1", 3L); + +expected.put("row.min", 2L); +expected.put("row.max", 2L); +expected.put("row.sum", 6L); + +// Log2 Histogram +expected.put("row.logHist.1", 3L); + +expected.put("family.min", 0L); +expected.put("family.max", 0L); +expected.put("family.sum", 0L); + +// Log2 Histogram +expected.put("family.logHist.0", 3L); + +expected.put("qualifier.min", 0L); +expected.put("qualifier.max", 0L); +expected.put("qualifier.sum", 0L); + +// Log2 Histogram +expected.put("qualifier.logHist.0", 3L); + +expected.put("visibility.min", 0L); +expected.put("visibility.max", 0L); +expected.put("visibility.sum", 0L); + +// Log2 Histogram +expected.put("visibility.logHist.0", 3L); + +expected.put("value.min", 0L); +expected.put("value.max", 0L); +expected.put("value.sum", 0L); + +// Log2 Histogram +expected.put("value.logHist.0", 3L); + +expected.put("total", 3L); + +Assert.assertEquals(expected, stats); + } + + @Test + public void testBasicFamily() { +SummarizerConfiguration sc = SummarizerConfiguration.builder(EntryLengthSummarizer.class).build(); +EntryLengthSummarizer entrySum = new
[GitHub] keith-turner commented on a change in pull request #320: ACCUMULO-4730 Created EntryLengthSummarizer
keith-turner commented on a change in pull request #320: ACCUMULO-4730 Created EntryLengthSummarizer URL: https://github.com/apache/accumulo/pull/320#discussion_r150944124 ## File path: core/src/main/java/org/apache/accumulo/core/client/summary/summarizers/EntryLengthSummarizer.java ## @@ -0,0 +1,146 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License");you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.accumulo.core.client.summary.summarizers; + +import java.math.RoundingMode; +import java.util.Map; +import java.util.function.BiFunction; + +import org.apache.accumulo.core.client.summary.Summarizer; +import org.apache.accumulo.core.client.summary.SummarizerConfiguration; +import org.apache.accumulo.core.data.Key; +import org.apache.accumulo.core.data.Value; + +import com.google.common.math.IntMath; + +/** + * Summarizer that computes summary information about field lengths. + * Specifically key length, row length, family length, qualifier length, visibility length, and value length. + * Incrementally computes minimum, maximum, count, sum, and log2 histogram of the lengths. + */ Review comment: should add `@since 2.0.0` javadoc tag. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] keith-turner commented on a change in pull request #320: ACCUMULO-4730 Created EntryLengthSummarizer
keith-turner commented on a change in pull request #320: ACCUMULO-4730 Created EntryLengthSummarizer URL: https://github.com/apache/accumulo/pull/320#discussion_r150943748 ## File path: core/src/test/java/org/apache/accumulo/core/client/summary/summarizers/EntryLengthSummarizersTest.java ## @@ -0,0 +1,1171 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License");you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.accumulo.core.client.summary.summarizers; + +import java.util.HashMap; + +import org.apache.accumulo.core.client.summary.summarizers.EntryLengthSummarizer; +import org.apache.accumulo.core.client.summary.Summarizer.Collector; +import org.apache.accumulo.core.client.summary.Summarizer.Combiner; +import org.apache.accumulo.core.client.summary.SummarizerConfiguration; +import org.apache.accumulo.core.data.Key; +import org.apache.accumulo.core.data.Value; +import org.junit.Assert; +import org.junit.Test; + +public class EntryLengthSummarizersTest { + + /* COLLECTOR TEST */ + /* Basic Test: Each test adds to the next, all are simple lengths. */ + + @Test + public void testEmpty() { +SummarizerConfiguration sc = SummarizerConfiguration.builder(EntryLengthSummarizer.class).build(); +EntryLengthSummarizer entrySum = new EntryLengthSummarizer(); + +Collector collector = entrySum.collector(sc); + +HashMapstats = new HashMap<>(); +collector.summarize(stats::put); + +HashMap expected = new HashMap<>(); +expected.put("key.min", 0L); +expected.put("key.max", 0L); +expected.put("key.sum", 0L); + +expected.put("row.min", 0L); +expected.put("row.max", 0L); +expected.put("row.sum", 0L); + +expected.put("family.min", 0L); +expected.put("family.max", 0L); +expected.put("family.sum", 0L); + +expected.put("qualifier.min", 0L); +expected.put("qualifier.max", 0L); +expected.put("qualifier.sum", 0L); + +expected.put("visibility.min", 0L); +expected.put("visibility.max", 0L); +expected.put("visibility.sum", 0L); + +expected.put("value.min", 0L); +expected.put("value.max", 0L); +expected.put("value.sum", 0L); + +expected.put("total", 0L); + +Assert.assertEquals(expected, stats); + } + + @Test + public void testBasicRow() { +SummarizerConfiguration sc = SummarizerConfiguration.builder(EntryLengthSummarizer.class).build(); +EntryLengthSummarizer entrySum = new EntryLengthSummarizer(); + +Key k1 = new Key("r1"); +Key k2 = new Key("r2"); +Key k3 = new Key("r3"); + +Collector collector = entrySum.collector(sc); +collector.accept(k1, new Value("")); +collector.accept(k2, new Value("")); +collector.accept(k3, new Value("")); + +HashMap stats = new HashMap<>(); +collector.summarize(stats::put); + +HashMap expected = new HashMap<>(); +expected.put("key.min", 2L); +expected.put("key.max", 2L); +expected.put("key.sum", 6L); + +// Log2 Histogram +expected.put("key.logHist.1", 3L); + +expected.put("row.min", 2L); +expected.put("row.max", 2L); +expected.put("row.sum", 6L); + +// Log2 Histogram +expected.put("row.logHist.1", 3L); + +expected.put("family.min", 0L); +expected.put("family.max", 0L); +expected.put("family.sum", 0L); + +// Log2 Histogram +expected.put("family.logHist.0", 3L); + +expected.put("qualifier.min", 0L); +expected.put("qualifier.max", 0L); +expected.put("qualifier.sum", 0L); + +// Log2 Histogram +expected.put("qualifier.logHist.0", 3L); + +expected.put("visibility.min", 0L); +expected.put("visibility.max", 0L); +expected.put("visibility.sum", 0L); + +// Log2 Histogram +expected.put("visibility.logHist.0", 3L); + +expected.put("value.min", 0L); +expected.put("value.max", 0L); +expected.put("value.sum", 0L); + +// Log2 Histogram +expected.put("value.logHist.0", 3L); + +expected.put("total", 3L); + +Assert.assertEquals(expected, stats); + } + + @Test + public void testBasicFamily() { +SummarizerConfiguration sc = SummarizerConfiguration.builder(EntryLengthSummarizer.class).build(); +EntryLengthSummarizer entrySum = new
[GitHub] keith-turner commented on a change in pull request #320: ACCUMULO-4730 Created EntryLengthSummarizer
keith-turner commented on a change in pull request #320: ACCUMULO-4730 Created EntryLengthSummarizer URL: https://github.com/apache/accumulo/pull/320#discussion_r150937678 ## File path: core/src/main/java/org/apache/accumulo/core/client/summary/summarizers/EntryLengthSummarizer.java ## @@ -0,0 +1,146 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License");you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.accumulo.core.client.summary.summarizers; + +import java.math.RoundingMode; +import java.util.Map; +import java.util.function.BiFunction; + +import org.apache.accumulo.core.client.summary.Summarizer; +import org.apache.accumulo.core.client.summary.SummarizerConfiguration; +import org.apache.accumulo.core.data.Key; +import org.apache.accumulo.core.data.Value; + +import com.google.common.math.IntMath; + +/** + * Summarizer that computes summary information about field lengths. + * Specifically key length, row length, family length, qualifier length, visibility length, and value length. + * Incrementally computes minimum, maximum, count, sum, and log2 histogram of the lengths. + */ +public class EntryLengthSummarizer implements Summarizer { + + /* Helper function that calculates the various statistics that is used for the Collector methods.*/ + private static class LengthStats { +private long min = Long.MAX_VALUE; +private long max = Long.MIN_VALUE; +private long sum = 0; +private long[] counts = new long[32]; + +private void accept(int length) { + int idx; + + if (length < min) { +min = length; + } + + if (length > max) { +max = length; + } + + sum += length; + + if (length == 0) { +idx = 0; + } else { +idx = IntMath.log2(length, RoundingMode.HALF_UP); + } + + counts[idx]++; +} + +void summarize (String prefix, StatisticConsumer sc) { + sc.accept(prefix+".min", (min != Long.MAX_VALUE ? min:0)); + sc.accept(prefix+".max", (max != Long.MIN_VALUE ? max:0)); + sc.accept(prefix+".sum", sum); + + for (int i = 0; i < counts.length; i++) { +if (counts[i] > 0) { + sc.accept(prefix+".logHist."+i, counts[i]); +} + } +} + + } + + /* Helper functions for merging that is used by the Combiner. */ + private static void merge(String key, BiFunctionmergeFunc, Map stats1, Map stats2) { +if (stats2.containsKey(key) && (stats1.containsKey(key) == false)) { Review comment: I think Java's `merge()` function on map handles the case where a key is not present. So I think the following code will insert the value from stats2 if stats1 did not have a value (w/o calling mergeFunc). ```java Long val2 = stats2.get(key); if(val2 != null) { stats1.merge(key, val2, mergeFunc); } ``` This also avoid calling `contains()` and then later `get()` on the map which is two lookups. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] keith-turner commented on a change in pull request #38: Continue tour
keith-turner commented on a change in pull request #38: Continue tour URL: https://github.com/apache/accumulo-website/pull/38#discussion_r150923657 ## File path: tour/authorizations.md ## @@ -0,0 +1,42 @@ +--- +title: Authorizations +--- +An Authorization is an authority granted to a User that allows them to read certain data. A column in Accumulo can have +a visibility attached to it, limiting access to the data of that column. Only users who have an authorization equal to +the visibility will be able to read the data in that column. Review comment: Could say `Only users with authorizations that satisfy a given visibility can read that column` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] keith-turner commented on a change in pull request #38: Continue tour
keith-turner commented on a change in pull request #38: Continue tour URL: https://github.com/apache/accumulo-website/pull/38#discussion_r150930217 ## File path: tour/ranges-splits.md ## @@ -0,0 +1,36 @@ +--- +title: Ranges and Splits +--- + +A Range is a specified group of Keys. There are many different ways to create a Range. Here are a few examples: +```java +new Range(Key startKey, Key endKey) // Creates a range from startKey inclusive to endKey inclusive. +new Range(CharSequence row) // Creates a range that covers an entire row. +new Range(CharSequence startRow, CharSequence endRow) // Creates a range from startRow inclusive to endRow inclusive. +``` + +A Scanner by default will scan all Keys in a table but this can be inefficient. It is a good practice to set a range on a Scanner. +```java +scanner.setRange(new Range("id", "id0010")); // returns rows from id to id0010 +``` + +As your data grows larger, Accumulo will split tables across multiple servers called Tablet Servers. +By default a table will get split on row boundaries, guaranteeing an entire row to be on one Tablet Server. We have the ability to +tell Accumulo were to split tables by setting split points. This is done using _addSplits_ in the [TableOperations] API. The image below +demonstrates how Accumulo splits data. + +![data distribution]({{ site.url }}/images/docs/data_distribution.png) + +There are many useful administrative methods in [TableOperations] so take a minute to look through the API. Here are some terms specific to Accumulo: Review comment: This is good content, but it feels out of place here. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] keith-turner commented on a change in pull request #38: Continue tour
keith-turner commented on a change in pull request #38: Continue tour URL: https://github.com/apache/accumulo-website/pull/38#discussion_r150920044 ## File path: tour/data-model.md ## @@ -0,0 +1,30 @@ +--- +title: Data Model +--- +Data is stored in Accumulo in a sorted [TreeMap]. The Keys are broken up logically into a few different parts, as seen in the image below. + +![key value pair]({{ site.url }}/images/docs/key_value.png) + +**Row ID** - Unique identifier for the row. +**Column Family** - Logical grouping of the key. Review comment: Could mention that this field can be used to partition data within a node. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] keith-turner commented on a change in pull request #38: Continue tour
keith-turner commented on a change in pull request #38: Continue tour URL: https://github.com/apache/accumulo-website/pull/38#discussion_r150928006 ## File path: tour/ranges-splits.md ## @@ -0,0 +1,36 @@ +--- +title: Ranges and Splits +--- + +A Range is a specified group of Keys. There are many different ways to create a Range. Here are a few examples: +```java +new Range(Key startKey, Key endKey) // Creates a range from startKey inclusive to endKey inclusive. +new Range(CharSequence row) // Creates a range that covers an entire row. +new Range(CharSequence startRow, CharSequence endRow) // Creates a range from startRow inclusive to endRow inclusive. +``` + +A Scanner by default will scan all Keys in a table but this can be inefficient. It is a good practice to set a range on a Scanner. +```java +scanner.setRange(new Range("id", "id0010")); // returns rows from id to id0010 +``` + +As your data grows larger, Accumulo will split tables across multiple servers called Tablet Servers. Review comment: I think `servers called ` could be dropped This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] keith-turner commented on a change in pull request #38: Continue tour
keith-turner commented on a change in pull request #38: Continue tour URL: https://github.com/apache/accumulo-website/pull/38#discussion_r150919866 ## File path: tour/data-model.md ## @@ -0,0 +1,30 @@ +--- +title: Data Model +--- +Data is stored in Accumulo in a sorted [TreeMap]. The Keys are broken up logically into a few different parts, as seen in the image below. + +![key value pair]({{ site.url }}/images/docs/key_value.png) + +**Row ID** - Unique identifier for the row. Review comment: Could mention that this field is used to partition data across nodes. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] keith-turner commented on a change in pull request #38: Continue tour
keith-turner commented on a change in pull request #38: Continue tour URL: https://github.com/apache/accumulo-website/pull/38#discussion_r150924130 ## File path: tour/authorizations.md ## @@ -0,0 +1,42 @@ +--- +title: Authorizations +--- +An Authorization is an authority granted to a User that allows them to read certain data. A column in Accumulo can have Review comment: Could say `Authorizations enable user to read protected data`. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] keith-turner commented on a change in pull request #38: Continue tour
keith-turner commented on a change in pull request #38: Continue tour URL: https://github.com/apache/accumulo-website/pull/38#discussion_r150920891 ## File path: tour/data-model.md ## @@ -0,0 +1,30 @@ +--- +title: Data Model +--- +Data is stored in Accumulo in a sorted [TreeMap]. The Keys are broken up logically into a few different parts, as seen in the image below. + +![key value pair]({{ site.url }}/images/docs/key_value.png) + +**Row ID** - Unique identifier for the row. +**Column Family** - Logical grouping of the key. +**Column Qualifier** - More specific attribute of the key. +**Column Visibility** - Security label controlling access to the key/value pair. Review comment: I think it would be useful to mention the timestamp along with all of the other fields and mention that it is used for versioning. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] keith-turner commented on a change in pull request #38: Continue tour
keith-turner commented on a change in pull request #38: Continue tour URL: https://github.com/apache/accumulo-website/pull/38#discussion_r150932748 ## File path: tour/ranges-splits.md ## @@ -0,0 +1,36 @@ +--- +title: Ranges and Splits +--- + +A Range is a specified group of Keys. There are many different ways to create a Range. Here are a few examples: +```java +new Range(Key startKey, Key endKey) // Creates a range from startKey inclusive to endKey inclusive. +new Range(CharSequence row) // Creates a range that covers an entire row. Review comment: Its odd that Jekyll renders this `Range` as red but not the one on the line before. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] keith-turner commented on a change in pull request #38: Continue tour
keith-turner commented on a change in pull request #38: Continue tour URL: https://github.com/apache/accumulo-website/pull/38#discussion_r150919421 ## File path: tour/data-model.md ## @@ -0,0 +1,30 @@ +--- +title: Data Model +--- +Data is stored in Accumulo in a sorted [TreeMap]. The Keys are broken up logically into a few different parts, as seen in the image below. Review comment: I would not say Accumulo is TreeMap, because its misleading in many ways. I would just call it a "distributed sorted map". This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] keith-turner commented on a change in pull request #38: Continue tour
keith-turner commented on a change in pull request #38: Continue tour URL: https://github.com/apache/accumulo-website/pull/38#discussion_r150925251 ## File path: tour/authorizations.md ## @@ -0,0 +1,42 @@ +--- +title: Authorizations +--- +An Authorization is an authority granted to a User that allows them to read certain data. A column in Accumulo can have +a visibility attached to it, limiting access to the data of that column. Only users who have an authorization equal to Review comment: I was trying to think of how to shorten this... though of the following fragment `limit access by setting a visibility label on a column` and gave up for now... This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] keith-turner commented on a change in pull request #38: Continue tour
keith-turner commented on a change in pull request #38: Continue tour URL: https://github.com/apache/accumulo-website/pull/38#discussion_r150922810 ## File path: tour/data-model-code.md ## @@ -0,0 +1,65 @@ +--- +title: Data Model Code +--- + +```java +// Connect to Mini Accumulo as the root user and create a table called "GothamPD". +Connector conn = mac.getConnector("root", "tourguide"); +conn.tableOperations().create("GothamPD"); + +// Create 3 Mutation objects to hold each person of interest. +Mutation mutation1 = new Mutation("id0001"); +Mutation mutation2 = new Mutation("id0002"); +Mutation mutation3 = new Mutation("id0003"); + +// Create key/value pairs for each Mutation, putting them in the appropriate family. +mutation1.put("hero","alias", "Batman"); +mutation1.put("hero","name", "Bruce Wayne"); +mutation1.put("hero","wearsCape?", "true"); +mutation2.put("hero","alias", "Robin"); +mutation2.put("hero","name", "Dick Grayson"); +mutation2.put("hero","wearsCape?", "true"); +mutation3.put("villain","alias", "Joker"); +mutation3.put("villain","name", "Unknown"); +mutation3.put("villain","wearsCape?", "false"); + +// Create a BatchWriter to the GothamPD table and add your mutations to it. Try w/ resources will close for us. +try(BatchWriter writer = conn.createBatchWriter("GothamPD", new BatchWriterConfig())) { +writer.addMutation(mutation1); +writer.addMutation(mutation2); +writer.addMutation(mutation3); +} + +// Read and print all rows of the "GothamPD" table. Try w/ resources will close for us. +try(Scanner scan = conn.createScanner("GothamPD", Authorizations.EMPTY)) { +System.out.println("Gotham Police Department Persons of Interest:"); +// A Scanner is an extension of java.lang.Iterable so behaves just like one. +for (Map.Entryentry : scan) { +System.out.println("Key:" + entry.getKey()); Review comment: Personally I would do some formatting here... I would try the following and see which one I liked Could line up key and value ```java System.out.println("Key : " + entry.getKey()); System.out.println("Value : " + entry.getValue()); ``` or could put them on one line with formatting ```java System.out.printf("Key : %30s Value : %s\n", entry.getKey(), entry.getKey()); ``` as long as all keys are shorter then 30 chars this should line up nicely... This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] keith-turner commented on a change in pull request #38: Continue tour
keith-turner commented on a change in pull request #38: Continue tour URL: https://github.com/apache/accumulo-website/pull/38#discussion_r150921482 ## File path: tour/data-model-code.md ## @@ -0,0 +1,65 @@ +--- +title: Data Model Code +--- + +```java +// Connect to Mini Accumulo as the root user and create a table called "GothamPD". +Connector conn = mac.getConnector("root", "tourguide"); +conn.tableOperations().create("GothamPD"); + +// Create 3 Mutation objects to hold each person of interest. +Mutation mutation1 = new Mutation("id0001"); +Mutation mutation2 = new Mutation("id0002"); +Mutation mutation3 = new Mutation("id0003"); + +// Create key/value pairs for each Mutation, putting them in the appropriate family. +mutation1.put("hero","alias", "Batman"); +mutation1.put("hero","name", "Bruce Wayne"); +mutation1.put("hero","wearsCape?", "true"); +mutation2.put("hero","alias", "Robin"); +mutation2.put("hero","name", "Dick Grayson"); +mutation2.put("hero","wearsCape?", "true"); +mutation3.put("villain","alias", "Joker"); +mutation3.put("villain","name", "Unknown"); +mutation3.put("villain","wearsCape?", "false"); + +// Create a BatchWriter to the GothamPD table and add your mutations to it. Try w/ resources will close for us. Review comment: Could mention that after close all data is written visible to scans. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] keith-turner commented on a change in pull request #38: Continue tour
keith-turner commented on a change in pull request #38: Continue tour URL: https://github.com/apache/accumulo-website/pull/38#discussion_r150927221 ## File path: tour/ranges-splits.md ## @@ -0,0 +1,36 @@ +--- +title: Ranges and Splits +--- + +A Range is a specified group of Keys. There are many different ways to create a Range. Here are a few examples: +```java +new Range(Key startKey, Key endKey) // Creates a range from startKey inclusive to endKey inclusive. +new Range(CharSequence row) // Creates a range that covers an entire row. +new Range(CharSequence startRow, CharSequence endRow) // Creates a range from startRow inclusive to endRow inclusive. +``` + +A Scanner by default will scan all Keys in a table but this can be inefficient. It is a good practice to set a range on a Scanner. +```java +scanner.setRange(new Range("id", "id0010")); // returns rows from id to id0010 +``` + +As your data grows larger, Accumulo will split tables across multiple servers called Tablet Servers. +By default a table will get split on row boundaries, guaranteeing an entire row to be on one Tablet Server. We have the ability to +tell Accumulo were to split tables by setting split points. This is done using _addSplits_ in the [TableOperations] API. The image below +demonstrates how Accumulo splits data. Review comment: Should mention Tablets in this paragraph. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Commented] (ACCUMULO-569) enable the monitor to show the current configuration
[ https://issues.apache.org/jira/browse/ACCUMULO-569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16251903#comment-16251903 ] Christopher Tubbs commented on ACCUMULO-569: This should be very easy now that there's a client API for this, and the new monitor for 2.0 is REST-based. > enable the monitor to show the current configuration > > > Key: ACCUMULO-569 > URL: https://issues.apache.org/jira/browse/ACCUMULO-569 > Project: Accumulo > Issue Type: New Feature >Reporter: Eric Newton >Priority: Minor > Fix For: 2.0.0 > > > As mentioned in ACCUMULO-123, display the current configuration (minus > passwords, of course) in the monitor. Basically, display the same thing you > can get in the shell. This is helpful for remote debugging with less > sophisticated users. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (ACCUMULO-569) enable the monitor to show the current configuration
[ https://issues.apache.org/jira/browse/ACCUMULO-569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Christopher Tubbs updated ACCUMULO-569: --- Fix Version/s: 2.0.0 > enable the monitor to show the current configuration > > > Key: ACCUMULO-569 > URL: https://issues.apache.org/jira/browse/ACCUMULO-569 > Project: Accumulo > Issue Type: New Feature >Reporter: Eric Newton >Priority: Minor > Fix For: 2.0.0 > > > As mentioned in ACCUMULO-123, display the current configuration (minus > passwords, of course) in the monitor. Basically, display the same thing you > can get in the shell. This is helpful for remote debugging with less > sophisticated users. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ACCUMULO-4672) NPE extracting samplerConfiguration from InputSplit
[ https://issues.apache.org/jira/browse/ACCUMULO-4672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16251837#comment-16251837 ] Aihua Xu commented on ACCUMULO-4672: HIVE-17373 is the jira that I'm trying to upgrade accumulo to 1.8.1. I just uploaded a new patch, but I tested this patch locally and noticed that we are hitting this NPE issue with the same stack trace. > NPE extracting samplerConfiguration from InputSplit > --- > > Key: ACCUMULO-4672 > URL: https://issues.apache.org/jira/browse/ACCUMULO-4672 > Project: Accumulo > Issue Type: Bug > Components: mapreduce >Reporter: Josh Elser >Priority: Minor > Fix For: 1.8.2, 2.0.0 > > > {noformat} > Caused by: java.lang.NullPointerException > at > org.apache.accumulo.core.client.mapred.AbstractInputFormat$AbstractRecordReader.initialize(AbstractInputFormat.java:608) > at > org.apache.accumulo.core.client.mapred.AccumuloRowInputFormat$1.initialize(AccumuloRowInputFormat.java:60) > at > org.apache.accumulo.core.client.mapred.AccumuloRowInputFormat.getRecordReader(AccumuloRowInputFormat.java:84) > {noformat} > I still need to dig into this one and try to write a test case for it that > doesn't involve Hive (as it may have just been something that I was doing). > Best as I can tell.. > AbstractInputFormat extracts a default table configuration object from the > Job's Configuration class: > {code} > InputTableConfig tableConfig = getInputTableConfig(job, > baseSplit.getTableName()); > {code} > Eventually, the same class tries to extract the samplerConfiguration from > this tableConfig (after noticing it is not present in the InputSplit) and > this throws an NPE. Somehow the tableConfig was null. It very well could be > that Hive was to blame, I just wanted to make sure that this was captured > before I forgot about it. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[GitHub] jmark99 commented on a change in pull request #37: Accumulo 4714 Create landing page for new developers
jmark99 commented on a change in pull request #37: Accumulo 4714 Create landing page for new developers URL: https://github.com/apache/accumulo-website/pull/37#discussion_r150912626 ## File path: pages/how-to-contribute.md ## @@ -0,0 +1,151 @@ +--- +title: How To Contribute +permalink: /how-to-contribute/ +redirect_from: /contributor/ +--- + +Contributions are welcome to all Apache Accumulo repositories. While most contributions are code, +there are other ways to contribute to Accumulo: + +* answer questions on mailing lists +* review pull requests +* verify and test new releases +* update the Accumulo website and documentation + +Contributions are reviewed (via GitHub pull requests) by +the community before being merged by a committer. + +This document provides basic instructions for contributing to Accumulo. If you are looking for more information, check out the more comprehensive [contributor guide](/contributors-guide/). + +## Issues + +Any contribution should have a corresponding issue. Accumulo uses [JIRA] for issue tracking. Before creating an issue, +you will need to create an [Apache JIRA account][jira-signup]. If you need help finding an issue to work on, check out +the [open issues labeled for newbies][newbie-issues] or [contact us][contact]. + +## Repositories + +Contributions can be made to the following repositories. While the general contribution workflow is +described below, repositories have special instructions in their `CONTRIBUTING.md` file which can be +viewed by clicking on `contribute` in the Links column below. + +| Repository | Links| Description +| --- | | --- +| [Accumulo][a] | [contribute][ac] | Core Project +| [Accumulo Website][w] | [contribute][wc] | Source for this website +| [Accumulo Examples][e] | [contribute][ec] | Accumulo example code +| [Accumulo Testing][t] | [contribute][tc] | Accumulo test suites such as continuous ingest and random walk +| [Accumulo Docker][d]| [contribute][dc] | Source for Accumulo Docker image +| [Accumulo Wikisearch][s]| [contribute][sc] | Accumulo example application that indexes and queries Wikipedia data + +## Contribution workflow + +1. Create an [Apache JIRA account][jira-signup] (for issue tracking) and [GitHub account][github-join] (for pull requests). +1. Find an [issue][newbie-issues] to work on or create one that describes the work that you want to do. +1. [Fork] and [clone] the GitHub repository that you want to contribute to. +1. Create a branch in the local clone of your fork. +``` +git checkout -b accumulo-4321 +``` +1. Do work and commit to your branch. You can reference [this link][messages] for a guide on +to write good commit log messages in git. Review comment: @keith-turner, thanks for catching that. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] keith-turner commented on a change in pull request #37: Accumulo 4714 Create landing page for new developers
keith-turner commented on a change in pull request #37: Accumulo 4714 Create landing page for new developers URL: https://github.com/apache/accumulo-website/pull/37#discussion_r150908111 ## File path: pages/how-to-contribute.md ## @@ -0,0 +1,151 @@ +--- +title: How To Contribute +permalink: /how-to-contribute/ +redirect_from: /contributor/ +--- + +Contributions are welcome to all Apache Accumulo repositories. While most contributions are code, +there are other ways to contribute to Accumulo: + +* answer questions on mailing lists +* review pull requests +* verify and test new releases +* update the Accumulo website and documentation + +Contributions are reviewed (via GitHub pull requests) by +the community before being merged by a committer. + +This document provides basic instructions for contributing to Accumulo. If you are looking for more information, check out the more comprehensive [contributor guide](/contributors-guide/). + +## Issues + +Any contribution should have a corresponding issue. Accumulo uses [JIRA] for issue tracking. Before creating an issue, +you will need to create an [Apache JIRA account][jira-signup]. If you need help finding an issue to work on, check out +the [open issues labeled for newbies][newbie-issues] or [contact us][contact]. + +## Repositories + +Contributions can be made to the following repositories. While the general contribution workflow is +described below, repositories have special instructions in their `CONTRIBUTING.md` file which can be +viewed by clicking on `contribute` in the Links column below. + +| Repository | Links| Description +| --- | | --- +| [Accumulo][a] | [contribute][ac] | Core Project +| [Accumulo Website][w] | [contribute][wc] | Source for this website +| [Accumulo Examples][e] | [contribute][ec] | Accumulo example code +| [Accumulo Testing][t] | [contribute][tc] | Accumulo test suites such as continuous ingest and random walk +| [Accumulo Docker][d]| [contribute][dc] | Source for Accumulo Docker image +| [Accumulo Wikisearch][s]| [contribute][sc] | Accumulo example application that indexes and queries Wikipedia data + +## Contribution workflow + +1. Create an [Apache JIRA account][jira-signup] (for issue tracking) and [GitHub account][github-join] (for pull requests). +1. Find an [issue][newbie-issues] to work on or create one that describes the work that you want to do. +1. [Fork] and [clone] the GitHub repository that you want to contribute to. +1. Create a branch in the local clone of your fork. +``` +git checkout -b accumulo-4321 +``` +1. Do work and commit to your branch. You can reference [this link][messages] for a guide on +to write good commit log messages in git. Review comment: Need to correct `...for a guide on to write good commit...` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services