Accumulo-Master - Build # 2178 - Still Failing

2017-11-09 Thread Apache Jenkins Server
The Apache Jenkins build system has built Accumulo-Master (build #2178)

Status: Still Failing

Check console output at https://builds.apache.org/job/Accumulo-Master/2178/ to 
view the results.

[GitHub] ben-manes commented on a change in pull request #321: ACCUMULO-4641 Added loading to cache API

2017-11-09 Thread GitBox
ben-manes commented on a change in pull request #321: ACCUMULO-4641 Added 
loading to cache API
URL: https://github.com/apache/accumulo/pull/321#discussion_r150140321
 
 

 ##
 File path: 
core/src/main/java/org/apache/accumulo/core/file/blockfile/cache/tinylfu/TinyLfuBlockCache.java
 ##
 @@ -47,7 +47,7 @@
  * Cache design: 
http://highscalability.com/blog/2016/1/25/design-of-a-modern-cache.html
  * 
  */
-public final class TinyLfuBlockCache implements BlockCache {
+public final class TinyLfuBlockCache extends SynchronousLoadingBlockCache 
implements BlockCache {
 
 Review comment:
   Sure, no rush. I mostly wanted to point out that the feature is built-in.
   
   Like yours, there is a chance of collisions so both are effectively 
per-entry but not absolutely. In `ConcurrentHashMap` it locks on the bin which 
contains a small number of entries. The likelihood of collision on a write is 
stated to be...
   > Lock contention probability for two threads accessing distinct elements is 
roughly 1 / (8 * #elements) under random hashes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] keith-turner commented on a change in pull request #321: ACCUMULO-4641 Added loading to cache API

2017-11-09 Thread GitBox
keith-turner commented on a change in pull request #321: ACCUMULO-4641 Added 
loading to cache API
URL: https://github.com/apache/accumulo/pull/321#discussion_r150139499
 
 

 ##
 File path: 
core/src/main/java/org/apache/accumulo/core/file/blockfile/cache/tinylfu/TinyLfuBlockCache.java
 ##
 @@ -47,7 +47,7 @@
  * Cache design: 
http://highscalability.com/blog/2016/1/25/design-of-a-modern-cache.html
  * 
  */
-public final class TinyLfuBlockCache implements BlockCache {
+public final class TinyLfuBlockCache extends SynchronousLoadingBlockCache 
implements BlockCache {
 
 Review comment:
   So multiple loads can run concurrently as along as they are for different 
cache keys?  Is this what you meant by a per-entry lock?
   
   I will refactor the code to use computeIfAbsent, but it may be a few days.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] keith-turner commented on a change in pull request #321: ACCUMULO-4641 Added loading to cache API

2017-11-09 Thread GitBox
keith-turner commented on a change in pull request #321: ACCUMULO-4641 Added 
loading to cache API
URL: https://github.com/apache/accumulo/pull/321#discussion_r150137270
 
 

 ##
 File path: 
core/src/main/java/org/apache/accumulo/core/file/blockfile/cache/tinylfu/TinyLfuBlockCache.java
 ##
 @@ -47,7 +47,7 @@
  * Cache design: 
http://highscalability.com/blog/2016/1/25/design-of-a-modern-cache.html
  * 
  */
-public final class TinyLfuBlockCache implements BlockCache {
+public final class TinyLfuBlockCache extends SynchronousLoadingBlockCache 
implements BlockCache {
 
 Review comment:
   I am not sure.  I saw a comment somewhere that said load operations should 
be *fast*, which is why I avoided relying on Caffine.   I can't remember where 
I saw that.  I am certainly open to refactoring the code to use Caffeine for 
load synchronization.   I tried looking at the internals of Caffine to see 
where and how it did its load sync, but got lost.  Any tips would be 
appreciated.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] ben-manes commented on a change in pull request #321: ACCUMULO-4641 Added loading to cache API

2017-11-09 Thread GitBox
ben-manes commented on a change in pull request #321: ACCUMULO-4641 Added 
loading to cache API
URL: https://github.com/apache/accumulo/pull/321#discussion_r150138225
 
 

 ##
 File path: 
core/src/main/java/org/apache/accumulo/core/file/blockfile/cache/tinylfu/TinyLfuBlockCache.java
 ##
 @@ -47,7 +47,7 @@
  * Cache design: 
http://highscalability.com/blog/2016/1/25/design-of-a-modern-cache.html
  * 
  */
-public final class TinyLfuBlockCache implements BlockCache {
+public final class TinyLfuBlockCache extends SynchronousLoadingBlockCache 
implements BlockCache {
 
 Review comment:
   Yes, the internals are unfriendly to glance through. Sadly performant code 
with lots of features quickly becomes gnarly.
   
   A `get(k, loader)` or `Map.computeIfAbsent` delegate to an internal 
[computeIfAbsent](https://github.com/ben-manes/caffeine/blob/master/caffeine/src/main/java/com/github/benmanes/caffeine/cache/BoundedLocalCache.java#L1991)
 method. This does a non-blocking get, validates the entry if found, and does a 
blocking `computeIfAbsent` if absent or expired. This way hot entries are a 
cheap read.
   
   ConcurrentHashMap does say computations should be fast, which we propagate 
into our docs too. I take it as a good reminder than a red flag warning, since 
it will be a blocking call when computed. More along the lines of keep your 
lock hold times short, or don't abuse db transactions for batch imports, rather 
than use locks or transactions sparingly.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] keith-turner commented on a change in pull request #321: ACCUMULO-4641 Added loading to cache API

2017-11-09 Thread GitBox
keith-turner commented on a change in pull request #321: ACCUMULO-4641 Added 
loading to cache API
URL: https://github.com/apache/accumulo/pull/321#discussion_r150137270
 
 

 ##
 File path: 
core/src/main/java/org/apache/accumulo/core/file/blockfile/cache/tinylfu/TinyLfuBlockCache.java
 ##
 @@ -47,7 +47,7 @@
  * Cache design: 
http://highscalability.com/blog/2016/1/25/design-of-a-modern-cache.html
  * 
  */
-public final class TinyLfuBlockCache implements BlockCache {
+public final class TinyLfuBlockCache extends SynchronousLoadingBlockCache 
implements BlockCache {
 
 Review comment:
   I am not sure.  I saw a comment somewhere that said load operations should 
be *fast*, which is why I avoided relying on the cache.   I can't remember 
where I saw that.  I am certainly open to refactoring the code to use Caffeine 
for load synchronization.   I tried looking at the internals of the code to see 
where and how it did its load sync, but got lost.  Any tips would be 
appreciated.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] ben-manes commented on a change in pull request #321: ACCUMULO-4641 Added loading to cache API

2017-11-09 Thread GitBox
ben-manes commented on a change in pull request #321: ACCUMULO-4641 Added 
loading to cache API
URL: https://github.com/apache/accumulo/pull/321#discussion_r150136161
 
 

 ##
 File path: 
core/src/main/java/org/apache/accumulo/core/file/blockfile/cache/tinylfu/TinyLfuBlockCache.java
 ##
 @@ -47,7 +47,7 @@
  * Cache design: 
http://highscalability.com/blog/2016/1/25/design-of-a-modern-cache.html
  * 
  */
-public final class TinyLfuBlockCache implements BlockCache {
+public final class TinyLfuBlockCache extends SynchronousLoadingBlockCache 
implements BlockCache {
 
 Review comment:
   The underlying cache already does memoization using a per-entry lock. Is 
this needed?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Resolved] (ACCUMULO-4462) Refactor cache code to avoid dog piliing

2017-11-09 Thread Keith Turner (JIRA)

 [ 
https://issues.apache.org/jira/browse/ACCUMULO-4462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Keith Turner resolved ACCUMULO-4462.

Resolution: Duplicate

> Refactor cache code to avoid dog piliing
> 
>
> Key: ACCUMULO-4462
> URL: https://issues.apache.org/jira/browse/ACCUMULO-4462
> Project: Accumulo
>  Issue Type: Improvement
>Reporter: Keith Turner
>
> This issue was identified by [~ben.manes] in discussion on ACCUMULO-4177.  
> For more details see [Ben's 
> comment|https://issues.apache.org/jira/browse/ACCUMULO-4177?focusedCommentId=15486074=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15486074]



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (ACCUMULO-4641) Modify BlockCache interface to avoid race conditions

2017-11-09 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/ACCUMULO-4641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ACCUMULO-4641:
-
Labels: pull-request-available  (was: )

> Modify BlockCache interface to avoid race conditions
> 
>
> Key: ACCUMULO-4641
> URL: https://issues.apache.org/jira/browse/ACCUMULO-4641
> Project: Accumulo
>  Issue Type: Sub-task
>Reporter: Keith Turner
>Assignee: Keith Turner
>  Labels: pull-request-available
> Fix For: 2.0.0
>
>
> Currently the BlockCache interface has functions to get and put.  Accumulo 
> will try to get a block, if it does not exist load it, and then put it in the 
> cache.  This can lead to race conditions where multiple threads unnecessarily 
> load the same block.
> I think it would be better to modify the block cache interface to only have a 
> function like the following.  
> {code:java}
>   CacheEntry get(String blockName, BlockLoader loader)
> {code} 
> BlockLoader represents a function that the cache can call if a block is not 
> present.  The cache implementation can attempt to handle load race conditions 
> however it likes..



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] keith-turner opened a new pull request #321: ACCUMULO-4641 Added loading to cache API

2017-11-09 Thread GitBox
keith-turner opened a new pull request #321: ACCUMULO-4641 Added loading to 
cache API
URL: https://github.com/apache/accumulo/pull/321
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] keith-turner commented on a change in pull request #320: ACCUMULO-4730 Created EntryLengthSummarizer

2017-11-09 Thread GitBox
keith-turner commented on a change in pull request #320: ACCUMULO-4730 Created 
EntryLengthSummarizer
URL: https://github.com/apache/accumulo/pull/320#discussion_r150121088
 
 

 ##
 File path: 
core/src/main/java/org/apache/accumulo/core/client/summary/summarizers/EntryLengthSummarizer.java
 ##
 @@ -0,0 +1,135 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License");you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.accumulo.core.client.summary.summarizers;
+
+import java.math.RoundingMode;
+import java.util.Map;
+
+import org.apache.accumulo.core.client.summary.Summarizer;
+import org.apache.accumulo.core.client.summary.SummarizerConfiguration;
+import org.apache.accumulo.core.data.Key;
+import org.apache.accumulo.core.data.Value;
+
+import com.google.common.math.IntMath;
+
+/**
+ * Summarizer that computes summary information about field lengths.
+ * Specifically key length, row length, family length, qualifier length, 
visibility length, and value length.
+ * Incrementally computes minimum, maximum, count, sum, and log2 histogram of 
the lengths.
+ */
+public class EntryLengthSummarizer implements Summarizer {
+
+  /* Helper function that calculates the various statistics that is used for 
the Collector methods.*/
+  private static class LengthStats {
+private long min = Long.MAX_VALUE;
+private long max = Long.MIN_VALUE;
+private long sum = 0;
+private long[] counts = new long[32];
+
+private void accept(int length) {
+  int idx;
+
+  if (length < min) {
+min = length;
+  }
+
+  if (length > max) {
+max = length;
+  }
+
+  sum += length;
+
+  if (length == 0) {
+idx = 0;
+  } else {
+idx = IntMath.log2(length, RoundingMode.HALF_UP);
+  }
+
+  counts[idx]++;
+}
+
+void summarize (String prefix, StatisticConsumer sc) {
+  sc.accept(prefix+".min", (min != Long.MAX_VALUE ? min:0));
+  sc.accept(prefix+".max", (max != Long.MIN_VALUE ? max:0));
+  sc.accept(prefix+".sum", sum);
+
+  for (int i = 0; i < counts.length; i++) {
+if (counts[i] > 0) {
+  sc.accept(prefix+".logHist."+i, counts[i]);
+}
+  }
+}
+
+  }
+
+  /* Helper function for merging that is used by the Combiner. */
+  private static void merge(String prefix, Map stats1, 
Map stats2 ) {
+stats1.merge(prefix+".min", stats2.get(prefix+".min"), Long::max);
+stats1.merge(prefix+".max", stats2.get(prefix+".max"), Long::max);
+stats1.merge(prefix+".sum", stats2.get(prefix+".sum"), Long::sum);
+stats2.forEach((k,v) -> stats1.merge(k, v, Long::sum));
 
 Review comment:
   I wasn't sure why the test were not catching this so I cloned the branch to 
experiment locally.  Realized the test are not calling this code, I completely 
missed that.  Below are the beginnings of a test I wrote while looking into 
this.
   
   ```java
 @Test
 public void testFoo() {
   SummarizerConfiguration sc = 
SummarizerConfiguration.builder(EntryLengthSummarizer.class).build();
   EntryLengthSummarizer entrySum = new EntryLengthSummarizer();
   
   Collector collector1 = entrySum.collector(sc);
   collector1.accept(new Key("1","f1","q1"), new Value("v1"));
   collector1.accept(new Key("1234","f1","q1"), new Value("v111"));
   collector1.accept(new Key("12345678","f1","q1"), new Value("v11"));
   
   Map stats1 = new TreeMap<>();
   collector1.summarize(stats1::put);
   
   Collector collector2 = entrySum.collector(sc);
   collector2.accept(new Key("5432","f11","q12"), new Value("2"));
   collector2.accept(new Key("12","f11","q1234"), new Value("12"));
   collector2.accept(new Key("12","f11","q11234567"), new Value(""));
   
   Map stats2 = new TreeMap<>();
   collector2.summarize(stats2::put);
   
   System.out.println("= Stats 1");
   stats1.entrySet().forEach(System.out::println);
   System.out.println("= Stats 2");
   stats2.entrySet().forEach(System.out::println);
   
   Combiner combiner = entrySum.combiner(sc);
   combiner.merge(stats1, stats2);
   
   System.out.println("= Merged");
   

[GitHub] keith-turner commented on issue #318: ACCUMULO-4733 Implement configurable security provider for crypto

2017-11-09 Thread GitBox
keith-turner commented on issue #318: ACCUMULO-4733 Implement configurable 
security provider for crypto
URL: https://github.com/apache/accumulo/pull/318#issuecomment-343320936
 
 
   @PircDef it would be nice to update the release notes w/ info about these 
new config.  
https://github.com/apache/accumulo-website/blob/master/_posts/release/2017-09-05-accumulo-2.0.0.md
   Would you be interested in doing this?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


Accumulo-Master - Build # 2177 - Still Failing

2017-11-09 Thread Apache Jenkins Server
The Apache Jenkins build system has built Accumulo-Master (build #2177)

Status: Still Failing

Check console output at https://builds.apache.org/job/Accumulo-Master/2177/ to 
view the results.

[GitHub] keith-turner commented on issue #318: ACCUMULO-4733 Implement configurable security provider for crypto

2017-11-09 Thread GitBox
keith-turner commented on issue #318: ACCUMULO-4733 Implement configurable 
security provider for crypto
URL: https://github.com/apache/accumulo/pull/318#issuecomment-343306327
 
 
   Merged and squashed in 314f9f2b2f70d28a52cb699e1c65b85e4d6be7e3


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] keith-turner closed pull request #318: ACCUMULO-4733 Implement configurable security provider for crypto

2017-11-09 Thread GitBox
keith-turner closed pull request #318: ACCUMULO-4733 Implement configurable 
security provider for crypto
URL: https://github.com/apache/accumulo/pull/318
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/core/src/main/java/org/apache/accumulo/core/conf/Property.java 
b/core/src/main/java/org/apache/accumulo/core/conf/Property.java
index e08df9e363..6980af1232 100644
--- a/core/src/main/java/org/apache/accumulo/core/conf/Property.java
+++ b/core/src/main/java/org/apache/accumulo/core/conf/Property.java
@@ -59,6 +59,9 @@
   CRYPTO_CIPHER_KEY_LENGTH("crypto.cipher.key.length", "128", 
PropertyType.STRING,
   "Specifies the key length *in bits* to use for the symmetric key, should 
probably be 128 or 256 unless you really know what you're doing"),
   @Experimental
+  CRYPTO_SECURITY_PROVIDER("crypto.security.provider", "", PropertyType.STRING,
+  "States the security provider to use, and defaults to the system 
configured provider"),
+  @Experimental
   CRYPTO_SECURE_RNG("crypto.secure.rng", "SHA1PRNG", PropertyType.STRING,
   "States the secure random number generator to use, and defaults to the 
built-in Sun SHA1PRNG"),
   @Experimental
diff --git 
a/core/src/main/java/org/apache/accumulo/core/security/crypto/CachingHDFSSecretKeyEncryptionStrategy.java
 
b/core/src/main/java/org/apache/accumulo/core/security/crypto/CachingHDFSSecretKeyEncryptionStrategy.java
index 7b79d99ec4..10feb2ae14 100644
--- 
a/core/src/main/java/org/apache/accumulo/core/security/crypto/CachingHDFSSecretKeyEncryptionStrategy.java
+++ 
b/core/src/main/java/org/apache/accumulo/core/security/crypto/CachingHDFSSecretKeyEncryptionStrategy.java
@@ -69,7 +69,8 @@ public CryptoModuleParameters 
decryptSecretKey(CryptoModuleParameters context) {
   }
 
   private void doKeyEncryptionOperation(int encryptionMode, 
CryptoModuleParameters params) throws IOException {
-Cipher cipher = 
DefaultCryptoModuleUtils.getCipher(params.getAllOptions().get(Property.CRYPTO_DEFAULT_KEY_STRATEGY_CIPHER_SUITE.getKey()));
+Cipher cipher = 
DefaultCryptoModuleUtils.getCipher(params.getAllOptions().get(Property.CRYPTO_DEFAULT_KEY_STRATEGY_CIPHER_SUITE.getKey()),
+params.getSecurityProvider());
 
 try {
   cipher.init(encryptionMode, new 
SecretKeySpec(secretKeyCache.getKeyEncryptionKey(), params.getAlgorithmName()));
diff --git 
a/core/src/main/java/org/apache/accumulo/core/security/crypto/CryptoModuleFactory.java
 
b/core/src/main/java/org/apache/accumulo/core/security/crypto/CryptoModuleFactory.java
index 6a295adbce..c2954248e5 100644
--- 
a/core/src/main/java/org/apache/accumulo/core/security/crypto/CryptoModuleFactory.java
+++ 
b/core/src/main/java/org/apache/accumulo/core/security/crypto/CryptoModuleFactory.java
@@ -286,6 +286,7 @@ public static CryptoModuleParameters 
fillParamsObjectFromStringMap(CryptoModuleP
 params.setPadding(cipherTransformParts[2]);
 
params.setRandomNumberGenerator(cryptoOpts.get(Property.CRYPTO_SECURE_RNG.getKey()));
 
params.setRandomNumberGeneratorProvider(cryptoOpts.get(Property.CRYPTO_SECURE_RNG_PROVIDER.getKey()));
+
params.setSecurityProvider(cryptoOpts.get(Property.CRYPTO_SECURITY_PROVIDER.getKey()));
 String blockStreamSize = 
cryptoOpts.get(Property.CRYPTO_BLOCK_STREAM_SIZE.getKey());
 if (blockStreamSize != null)
   params.setBlockStreamSize(Integer.parseInt(blockStreamSize));
diff --git 
a/core/src/main/java/org/apache/accumulo/core/security/crypto/CryptoModuleParameters.java
 
b/core/src/main/java/org/apache/accumulo/core/security/crypto/CryptoModuleParameters.java
index a7bb93dfdd..b1d5024982 100644
--- 
a/core/src/main/java/org/apache/accumulo/core/security/crypto/CryptoModuleParameters.java
+++ 
b/core/src/main/java/org/apache/accumulo/core/security/crypto/CryptoModuleParameters.java
@@ -232,6 +232,26 @@ public void setRandomNumberGeneratorProvider(String 
randomNumberGeneratorProvide
   }
 
   /**
+   * Gets the security provider name.
+   *
+   * @see CryptoParameters#setSecurityProvider(String)
+   * @return the security provider name
+   */
+  public String getSecurityProvider() {
+return securityProvider;
+  }
+
+  /**
+   * Sets the name of the security provider to use for crypto.
+   *
+   * @param securityProvider
+   *  the name of the provider to use
+   */
+  public void setSecurityProvider(String securityProvider) {
+this.securityProvider = securityProvider;
+  }
+
+  /**
* Gets the key encryption strategy class.
*
* @see CryptoModuleParameters#setKeyEncryptionStrategyClass(String)
@@ -609,6 +629,7 @@ public void setAllOptions(Map allOptions) {
   private int keyLength = 0;
   private String randomNumberGenerator = null;
   private String 

[GitHub] keith-turner closed pull request #319: ACCUMULO-4737 Clean up cipher algorithm configuration

2017-11-09 Thread GitBox
keith-turner closed pull request #319: ACCUMULO-4737 Clean up cipher algorithm 
configuration
URL: https://github.com/apache/accumulo/pull/319
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git 
a/core/src/main/java/org/apache/accumulo/core/conf/ConfigSanityCheck.java 
b/core/src/main/java/org/apache/accumulo/core/conf/ConfigSanityCheck.java
index 130863c961..bf84c650e9 100644
--- a/core/src/main/java/org/apache/accumulo/core/conf/ConfigSanityCheck.java
+++ b/core/src/main/java/org/apache/accumulo/core/conf/ConfigSanityCheck.java
@@ -17,6 +17,7 @@
 package org.apache.accumulo.core.conf;
 
 import java.util.Map.Entry;
+import java.util.Objects;
 
 import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;
@@ -46,6 +47,8 @@
   public static void validate(Iterable> entries) {
 String instanceZkTimeoutValue = null;
 boolean usingVolumes = false;
+String cipherSuite = null;
+String keyAlgorithm = null;
 for (Entry entry : entries) {
   String key = entry.getKey();
   String value = entry.getValue();
@@ -66,6 +69,14 @@ else if (!prop.getType().isValidFormat(value))
   if (key.equals(Property.INSTANCE_VOLUMES.getKey())) {
 usingVolumes = value != null && !value.isEmpty();
   }
+
+  if (key.equals(Property.CRYPTO_CIPHER_SUITE.getKey())) {
+cipherSuite = Objects.requireNonNull(value);
+  }
+
+  if (key.equals(Property.CRYPTO_CIPHER_KEY_ALGORITHM_NAME.getKey())) {
+keyAlgorithm = Objects.requireNonNull(value);
+  }
 }
 
 if (instanceZkTimeoutValue != null) {
@@ -75,6 +86,14 @@ else if (!prop.getType().isValidFormat(value))
 if (!usingVolumes) {
   log.warn("Use of {} and {} are deprecated. Consider using {} instead.", 
INSTANCE_DFS_URI, INSTANCE_DFS_DIR, Property.INSTANCE_VOLUMES);
 }
+
+if (cipherSuite.equals("NullCipher") && 
!keyAlgorithm.equals("NullCipher")) {
+  fatal(Property.CRYPTO_CIPHER_SUITE.getKey() + " should be configured 
when " + Property.CRYPTO_CIPHER_KEY_ALGORITHM_NAME.getKey() + " is set.");
+}
+
+if (!cipherSuite.equals("NullCipher") && 
keyAlgorithm.equals("NullCipher")) {
+  fatal(Property.CRYPTO_CIPHER_KEY_ALGORITHM_NAME.getKey() + " should be 
configured when " + Property.CRYPTO_CIPHER_SUITE.getKey() + " is set.");
+}
   }
 
   private interface CheckTimeDuration {
diff --git a/core/src/main/java/org/apache/accumulo/core/conf/Property.java 
b/core/src/main/java/org/apache/accumulo/core/conf/Property.java
index e08df9e363..f0e233894f 100644
--- a/core/src/main/java/org/apache/accumulo/core/conf/Property.java
+++ b/core/src/main/java/org/apache/accumulo/core/conf/Property.java
@@ -48,10 +48,18 @@
   "Fully qualified class name of the class that implements the 
CryptoModule interface, to be used in setting up encryption at rest for the WAL 
and "
   + "(future) other parts of the code."),
   @Experimental
-  CRYPTO_CIPHER_SUITE("crypto.cipher.suite", "NullCipher", 
PropertyType.STRING, "Describes the cipher suite to use for the write-ahead 
log"),
+  CRYPTO_CIPHER_SUITE("crypto.cipher.suite", "NullCipher", PropertyType.STRING,
+  "Describes the cipher suite to use for rfile encryption. If a WAL cipher 
suite is not set, it will default to this value. The suite should be in the "
+  + "form of algorithm/mode/padding, e.g. AES/CBC/NoPadding"),
   @Experimental
-  CRYPTO_CIPHER_ALGORITHM_NAME("crypto.cipher.algorithm.name", "NullCipher", 
PropertyType.STRING,
-  "States the name of the algorithm used in the corresponding cipher 
suite. Do not make these different, unless you enjoy mysterious exceptions and 
bugs."),
+  CRYPTO_WAL_CIPHER_SUITE(
+  "crypto.wal.cipher.suite",
+  "NullCipher",
+  PropertyType.STRING,
+  "Describes the cipher suite to use for the write-ahead log. Defaults to 
'cyrpto.cipher.suite' and will use that value for WAL encryption unless 
otherwise specified."),
+  @Experimental
+  CRYPTO_CIPHER_KEY_ALGORITHM_NAME("crypto.cipher.key.algorithm.name", 
"NullCipher", PropertyType.STRING,
+  "States the name of the algorithm used for the key for the corresponding 
cipher suite. The key type must be compatible with the cipher suite."),
   @Experimental
   CRYPTO_BLOCK_STREAM_SIZE("crypto.block.stream.size", "1K", 
PropertyType.BYTES,
   "The size of the buffer above the cipher stream. Used for reading files 
and padding walog entries."),
diff --git 
a/core/src/main/java/org/apache/accumulo/core/file/rfile/bcfile/BCFile.java 
b/core/src/main/java/org/apache/accumulo/core/file/rfile/bcfile/BCFile.java
index 5cfe8242ea..212e1f671d 100644
--- a/core/src/main/java/org/apache/accumulo/core/file/rfile/bcfile/BCFile.java
+++ 

[GitHub] keith-turner commented on issue #319: ACCUMULO-4737 Clean up cipher algorithm configuration

2017-11-09 Thread GitBox
keith-turner commented on issue #319: ACCUMULO-4737 Clean up cipher algorithm 
configuration
URL: https://github.com/apache/accumulo/pull/319#issuecomment-343305949
 
 
   Merged and squashed in daeffd8d9cf980814fb7d131a0d89cbdfd856298


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] keith-turner closed pull request #300: ACCUMULO-4708 Limit RFile block size to 2GB

2017-11-09 Thread GitBox
keith-turner closed pull request #300: ACCUMULO-4708 Limit RFile block size to 
2GB
URL: https://github.com/apache/accumulo/pull/300
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git 
a/core/src/main/java/org/apache/accumulo/core/conf/ConfigSanityCheck.java 
b/core/src/main/java/org/apache/accumulo/core/conf/ConfigSanityCheck.java
index 130863c961..9c2ed6c377 100644
--- a/core/src/main/java/org/apache/accumulo/core/conf/ConfigSanityCheck.java
+++ b/core/src/main/java/org/apache/accumulo/core/conf/ConfigSanityCheck.java
@@ -21,6 +21,8 @@
 import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;
 
+import com.google.common.base.Preconditions;
+
 /**
  * A utility class for validating {@link AccumuloConfiguration} instances.
  */
@@ -66,6 +68,13 @@ else if (!prop.getType().isValidFormat(value))
   if (key.equals(Property.INSTANCE_VOLUMES.getKey())) {
 usingVolumes = value != null && !value.isEmpty();
   }
+
+  // If the block size or block size index is configured to be too large, 
we throw an exception to avoid potentially corrupting RFiles later
+  if (key.equals(Property.TABLE_FILE_COMPRESSED_BLOCK_SIZE_INDEX.getKey()) 
|| key.equals(Property.TABLE_FILE_COMPRESSED_BLOCK_SIZE.getKey())) {
+long bsize = ConfigurationTypeHelper.getFixedMemoryAsBytes(value);
+Preconditions.checkArgument(bsize > 0 && bsize < Integer.MAX_VALUE, 
key + " must be greater than 0 and less than " + Integer.MAX_VALUE + " but was: 
"
++ bsize);
+  }
 }
 
 if (instanceZkTimeoutValue != null) {
diff --git a/core/src/main/java/org/apache/accumulo/core/file/rfile/RFile.java 
b/core/src/main/java/org/apache/accumulo/core/file/rfile/RFile.java
index c1931daed3..c399a22da8 100644
--- a/core/src/main/java/org/apache/accumulo/core/file/rfile/RFile.java
+++ b/core/src/main/java/org/apache/accumulo/core/file/rfile/RFile.java
@@ -404,7 +404,7 @@ public void flushIfNeeded() throws IOException {
 private SampleLocalityGroupWriter sample;
 
 private SummaryStatistics keyLenStats = new SummaryStatistics();
-private double avergageKeySize = 0;
+private double averageKeySize = 0;
 
 LocalityGroupWriter(BlockFileWriter fileWriter, long blockSize, long 
maxBlockSize, LocalityGroupMetadata currentLocalityGroup,
 SampleLocalityGroupWriter sample) {
@@ -441,19 +441,27 @@ public void append(Key key, Value value) throws 
IOException {
   } else if (blockWriter.getRawSize() > blockSize) {
 
 // Look for a key thats short to put in the index, defining short as 
average or below.
-if (avergageKeySize == 0) {
+if (averageKeySize == 0) {
   // use the same average for the search for a below average key for a 
block
-  avergageKeySize = keyLenStats.getMean();
+  averageKeySize = keyLenStats.getMean();
 }
 
 // Possibly produce a shorter key that does not exist in data. Even if 
a key can be shortened, it may not be below average.
 Key closeKey = KeyShortener.shorten(prevKey, key);
 
-if ((closeKey.getSize() <= avergageKeySize || blockWriter.getRawSize() 
> maxBlockSize) && !isGiantKey(closeKey)) {
+if ((closeKey.getSize() <= averageKeySize || blockWriter.getRawSize() 
> maxBlockSize) && !isGiantKey(closeKey)) {
   closeBlock(closeKey, false);
   blockWriter = fileWriter.prepareDataBlock();
   // set average to zero so its recomputed for the next block
-  avergageKeySize = 0;
+  averageKeySize = 0;
+  // To constrain the growth of data blocks, we limit our worst case 
scenarios to closing
+  // blocks if they reach the maximum configurable block size of 
Integer.MAX_VALUE.
+  // 128 bytes added for metadata overhead
+} else if (((long) key.getSize() + (long) value.getSize() + 
blockWriter.getRawSize() + 128L) >= Integer.MAX_VALUE) {
+  closeBlock(closeKey, false);
+  blockWriter = fileWriter.prepareDataBlock();
+  averageKeySize = 0;
+
 }
   }
 
diff --git 
a/core/src/main/java/org/apache/accumulo/core/file/rfile/RFileOperations.java 
b/core/src/main/java/org/apache/accumulo/core/file/rfile/RFileOperations.java
index 4d1af7ef7d..195da93122 100644
--- 
a/core/src/main/java/org/apache/accumulo/core/file/rfile/RFileOperations.java
+++ 
b/core/src/main/java/org/apache/accumulo/core/file/rfile/RFileOperations.java
@@ -38,6 +38,8 @@
 import org.apache.hadoop.fs.FileSystem;
 import org.apache.hadoop.fs.Path;
 
+import com.google.common.base.Preconditions;
+
 public class RFileOperations extends FileOperations {
 
   private static final Collection EMPTY_CF_SET = 
Collections.emptySet();
@@ -82,7 +84,11 @@ protected FileSKVWriter 

[GitHub] keith-turner commented on issue #300: ACCUMULO-4708 Limit RFile block size to 2GB

2017-11-09 Thread GitBox
keith-turner commented on issue #300: ACCUMULO-4708 Limit RFile block size to 
2GB
URL: https://github.com/apache/accumulo/pull/300#issuecomment-343305662
 
 
   Squashed and merged this in e875f01ed519100225bdf3a1550c40caccd8521f
   
   Thanks for the contribution @PircDef.  Would you like to added to the 
Accumulo people page.  If so let me know what info you would like added or 
submit a PR to the accumulo-website repo modifying the people page.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] PircDef commented on a change in pull request #319: ACCUMULO-4737 Clean up cipher algorithm configuration

2017-11-09 Thread GitBox
PircDef commented on a change in pull request #319: ACCUMULO-4737 Clean up 
cipher algorithm configuration
URL: https://github.com/apache/accumulo/pull/319#discussion_r150048681
 
 

 ##
 File path: 
core/src/main/java/org/apache/accumulo/core/security/crypto/CryptoModuleParameters.java
 ##
 @@ -56,73 +56,45 @@ public String getAlgorithmName() {
* decryption. 
* For decryption, this value is often disregarded in favor of the 
value encoded with the ciphertext.
*
-   * @param algorithmName
+   * @param keyAlgorithmName
*  the name of the cryptographic algorithm to use.
* @see http://docs.oracle.com/javase/1.5.0/docs/guide/security/jce/JCERefGuide.html#AppA;>Standard
 Algorithm Names in JCE
*
*/
 
-  public void setAlgorithmName(String algorithmName) {
-this.algorithmName = algorithmName;
+  public void setKeyAlgorithmName(String keyAlgorithmName) {
+this.keyAlgorithmName = keyAlgorithmName;
 
 Review comment:
   Added the check to the sanity config check.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] jkrdev commented on a change in pull request #320: ACCUMULO-4730 Created EntryLengthSummarizer

2017-11-09 Thread GitBox
jkrdev commented on a change in pull request #320: ACCUMULO-4730 Created 
EntryLengthSummarizer
URL: https://github.com/apache/accumulo/pull/320#discussion_r150036637
 
 

 ##
 File path: 
core/src/main/java/org/apache/accumulo/core/client/summary/summarizers/EntryLengthSummarizer.java
 ##
 @@ -0,0 +1,135 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License");you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.accumulo.core.client.summary.summarizers;
+
+import java.math.RoundingMode;
+import java.util.Map;
+
+import org.apache.accumulo.core.client.summary.Summarizer;
+import org.apache.accumulo.core.client.summary.SummarizerConfiguration;
+import org.apache.accumulo.core.data.Key;
+import org.apache.accumulo.core.data.Value;
+
+import com.google.common.math.IntMath;
+
+/**
+ * Summarizer that computes summary information about field lengths.
+ * Specifically key length, row length, family length, qualifier length, 
visibility length, and value length.
+ * Incrementally computes minimum, maximum, count, sum, and log2 histogram of 
the lengths.
+ */
+public class EntryLengthSummarizer implements Summarizer {
+
+  /* Helper function that calculates the various statistics that is used for 
the Collector methods.*/
+  private static class LengthStats {
+private long min = Long.MAX_VALUE;
+private long max = Long.MIN_VALUE;
+private long sum = 0;
+private long[] counts = new long[32];
+
+private void accept(int length) {
+  int idx;
+
+  if (length < min) {
+min = length;
+  }
+
+  if (length > max) {
+max = length;
+  }
+
+  sum += length;
+
+  if (length == 0) {
+idx = 0;
+  } else {
+idx = IntMath.log2(length, RoundingMode.HALF_UP);
+  }
+
+  counts[idx]++;
+}
+
+void summarize (String prefix, StatisticConsumer sc) {
+  sc.accept(prefix+".min", (min != Long.MAX_VALUE ? min:0));
+  sc.accept(prefix+".max", (max != Long.MIN_VALUE ? max:0));
+  sc.accept(prefix+".sum", sum);
+
+  for (int i = 0; i < counts.length; i++) {
+if (counts[i] > 0) {
+  sc.accept(prefix+".logHist."+i, counts[i]);
+}
+  }
+}
+
+  }
+
+  /* Helper function for merging that is used by the Combiner. */
+  private static void merge(String prefix, Map stats1, 
Map stats2 ) {
+stats1.merge(prefix+".min", stats2.get(prefix+".min"), Long::max);
+stats1.merge(prefix+".max", stats2.get(prefix+".max"), Long::max);
+stats1.merge(prefix+".sum", stats2.get(prefix+".sum"), Long::sum);
+stats2.forEach((k,v) -> stats1.merge(k, v, Long::sum));
 
 Review comment:
   Now looking at it again it does seem it shouldn't be working but why it is I 
don't exactly know, but again whether max, min, sum or null it returns sams 
values.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] jkrdev commented on a change in pull request #320: ACCUMULO-4730 Created EntryLengthSummarizer

2017-11-09 Thread GitBox
jkrdev commented on a change in pull request #320: ACCUMULO-4730 Created 
EntryLengthSummarizer
URL: https://github.com/apache/accumulo/pull/320#discussion_r150035212
 
 

 ##
 File path: 
core/src/main/java/org/apache/accumulo/core/client/summary/summarizers/EntryLengthSummarizer.java
 ##
 @@ -0,0 +1,135 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License");you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.accumulo.core.client.summary.summarizers;
+
+import java.math.RoundingMode;
+import java.util.Map;
+
+import org.apache.accumulo.core.client.summary.Summarizer;
+import org.apache.accumulo.core.client.summary.SummarizerConfiguration;
+import org.apache.accumulo.core.data.Key;
+import org.apache.accumulo.core.data.Value;
+
+import com.google.common.math.IntMath;
+
+/**
+ * Summarizer that computes summary information about field lengths.
+ * Specifically key length, row length, family length, qualifier length, 
visibility length, and value length.
+ * Incrementally computes minimum, maximum, count, sum, and log2 histogram of 
the lengths.
+ */
+public class EntryLengthSummarizer implements Summarizer {
+
+  /* Helper function that calculates the various statistics that is used for 
the Collector methods.*/
+  private static class LengthStats {
+private long min = Long.MAX_VALUE;
+private long max = Long.MIN_VALUE;
+private long sum = 0;
+private long[] counts = new long[32];
+
+private void accept(int length) {
+  int idx;
+
+  if (length < min) {
+min = length;
+  }
+
+  if (length > max) {
+max = length;
+  }
+
+  sum += length;
+
+  if (length == 0) {
+idx = 0;
+  } else {
+idx = IntMath.log2(length, RoundingMode.HALF_UP);
+  }
+
+  counts[idx]++;
+}
+
+void summarize (String prefix, StatisticConsumer sc) {
+  sc.accept(prefix+".min", (min != Long.MAX_VALUE ? min:0));
+  sc.accept(prefix+".max", (max != Long.MIN_VALUE ? max:0));
+  sc.accept(prefix+".sum", sum);
+
+  for (int i = 0; i < counts.length; i++) {
+if (counts[i] > 0) {
+  sc.accept(prefix+".logHist."+i, counts[i]);
+}
+  }
+}
+
+  }
+
+  /* Helper function for merging that is used by the Combiner. */
+  private static void merge(String prefix, Map stats1, 
Map stats2 ) {
+stats1.merge(prefix+".min", stats2.get(prefix+".min"), Long::max);
+stats1.merge(prefix+".max", stats2.get(prefix+".max"), Long::max);
+stats1.merge(prefix+".sum", stats2.get(prefix+".sum"), Long::sum);
+stats2.forEach((k,v) -> stats1.merge(k, v, Long::sum));
 
 Review comment:
   I'm not sure I can articulate this correctly but I will try. I believe that 
since there is only one value passed into the merge functions that the 
remapping function recomputes the same value passed in. If that makes sense. 
Either way though, I played around with that during my initial coding, and it 
doesn't in this particular case seem to make any difference. I could make that 
parameter "null"  if that would be better, the output will be the same 
according to the test. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] jkrdev commented on a change in pull request #320: ACCUMULO-4730 Created EntryLengthSummarizer

2017-11-09 Thread GitBox
jkrdev commented on a change in pull request #320: ACCUMULO-4730 Created 
EntryLengthSummarizer
URL: https://github.com/apache/accumulo/pull/320#discussion_r150035212
 
 

 ##
 File path: 
core/src/main/java/org/apache/accumulo/core/client/summary/summarizers/EntryLengthSummarizer.java
 ##
 @@ -0,0 +1,135 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License");you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.accumulo.core.client.summary.summarizers;
+
+import java.math.RoundingMode;
+import java.util.Map;
+
+import org.apache.accumulo.core.client.summary.Summarizer;
+import org.apache.accumulo.core.client.summary.SummarizerConfiguration;
+import org.apache.accumulo.core.data.Key;
+import org.apache.accumulo.core.data.Value;
+
+import com.google.common.math.IntMath;
+
+/**
+ * Summarizer that computes summary information about field lengths.
+ * Specifically key length, row length, family length, qualifier length, 
visibility length, and value length.
+ * Incrementally computes minimum, maximum, count, sum, and log2 histogram of 
the lengths.
+ */
+public class EntryLengthSummarizer implements Summarizer {
+
+  /* Helper function that calculates the various statistics that is used for 
the Collector methods.*/
+  private static class LengthStats {
+private long min = Long.MAX_VALUE;
+private long max = Long.MIN_VALUE;
+private long sum = 0;
+private long[] counts = new long[32];
+
+private void accept(int length) {
+  int idx;
+
+  if (length < min) {
+min = length;
+  }
+
+  if (length > max) {
+max = length;
+  }
+
+  sum += length;
+
+  if (length == 0) {
+idx = 0;
+  } else {
+idx = IntMath.log2(length, RoundingMode.HALF_UP);
+  }
+
+  counts[idx]++;
+}
+
+void summarize (String prefix, StatisticConsumer sc) {
+  sc.accept(prefix+".min", (min != Long.MAX_VALUE ? min:0));
+  sc.accept(prefix+".max", (max != Long.MIN_VALUE ? max:0));
+  sc.accept(prefix+".sum", sum);
+
+  for (int i = 0; i < counts.length; i++) {
+if (counts[i] > 0) {
+  sc.accept(prefix+".logHist."+i, counts[i]);
+}
+  }
+}
+
+  }
+
+  /* Helper function for merging that is used by the Combiner. */
+  private static void merge(String prefix, Map stats1, 
Map stats2 ) {
+stats1.merge(prefix+".min", stats2.get(prefix+".min"), Long::max);
+stats1.merge(prefix+".max", stats2.get(prefix+".max"), Long::max);
+stats1.merge(prefix+".sum", stats2.get(prefix+".sum"), Long::sum);
+stats2.forEach((k,v) -> stats1.merge(k, v, Long::sum));
 
 Review comment:
   I'm not sure I can articulate this correctly but I will try. I believe that 
since their is only one value passed into the merge functions that the 
remapping function recomputes the same value passed in. If that makes sense. 
Either way though, I played around with that during my initial coding, and it 
doesn't in this particular case seem to make any difference. I could make that 
parameter "null"  if that would be better, the output will be the same 
according to the test. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] milleruntime opened a new pull request #38: Continue tour

2017-11-09 Thread GitBox
milleruntime opened a new pull request #38: Continue tour
URL: https://github.com/apache/accumulo-website/pull/38
 
 
   Exercises added for:
   Auths
   data model
   ranges and splits
   batch scanner


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] keith-turner commented on a change in pull request #320: ACCUMULO-4730 Created EntryLengthSummarizer

2017-11-09 Thread GitBox
keith-turner commented on a change in pull request #320: ACCUMULO-4730 Created 
EntryLengthSummarizer
URL: https://github.com/apache/accumulo/pull/320#discussion_r150026207
 
 

 ##
 File path: 
core/src/main/java/org/apache/accumulo/core/client/summary/summarizers/EntryLengthSummarizer.java
 ##
 @@ -0,0 +1,135 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License");you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.accumulo.core.client.summary.summarizers;
+
+import java.math.RoundingMode;
+import java.util.Map;
+
+import org.apache.accumulo.core.client.summary.Summarizer;
+import org.apache.accumulo.core.client.summary.SummarizerConfiguration;
+import org.apache.accumulo.core.data.Key;
+import org.apache.accumulo.core.data.Value;
+
+import com.google.common.math.IntMath;
+
+/**
+ * Summarizer that computes summary information about field lengths.
+ * Specifically key length, row length, family length, qualifier length, 
visibility length, and value length.
+ * Incrementally computes minimum, maximum, count, sum, and log2 histogram of 
the lengths.
+ */
+public class EntryLengthSummarizer implements Summarizer {
+
+  /* Helper function that calculates the various statistics that is used for 
the Collector methods.*/
+  private static class LengthStats {
+private long min = Long.MAX_VALUE;
+private long max = Long.MIN_VALUE;
+private long sum = 0;
+private long[] counts = new long[32];
+
+private void accept(int length) {
+  int idx;
+
+  if (length < min) {
+min = length;
+  }
+
+  if (length > max) {
+max = length;
+  }
+
+  sum += length;
+
+  if (length == 0) {
+idx = 0;
+  } else {
+idx = IntMath.log2(length, RoundingMode.HALF_UP);
+  }
+
+  counts[idx]++;
+}
+
+void summarize (String prefix, StatisticConsumer sc) {
+  sc.accept(prefix+".min", (min != Long.MAX_VALUE ? min:0));
+  sc.accept(prefix+".max", (max != Long.MIN_VALUE ? max:0));
+  sc.accept(prefix+".sum", sum);
+
+  for (int i = 0; i < counts.length; i++) {
+if (counts[i] > 0) {
+  sc.accept(prefix+".logHist."+i, counts[i]);
+}
+  }
+}
+
+  }
+
+  /* Helper function for merging that is used by the Combiner. */
+  private static void merge(String prefix, Map stats1, 
Map stats2 ) {
+stats1.merge(prefix+".min", stats2.get(prefix+".min"), Long::max);
+stats1.merge(prefix+".max", stats2.get(prefix+".max"), Long::max);
+stats1.merge(prefix+".sum", stats2.get(prefix+".sum"), Long::sum);
+stats2.forEach((k,v) -> stats1.merge(k, v, Long::sum));
 
 Review comment:
   It seems like this will sum everything, even the min and max keys.  I 
suspect this is only intended to sum log histogram keys?If so it needs a 
check for the key before doing merge.  Also, it seems like the test should 
catch this?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] jkrdev commented on a change in pull request #320: ACCUMULO-4730 Created EntryLengthSummarizer

2017-11-09 Thread GitBox
jkrdev commented on a change in pull request #320: ACCUMULO-4730 Created 
EntryLengthSummarizer
URL: https://github.com/apache/accumulo/pull/320#discussion_r149952335
 
 

 ##
 File path: 
core/src/main/java/org/apache/accumulo/core/client/summary/summarizers/EntryLengthSummarizer.java
 ##
 @@ -0,0 +1,329 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License");you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.accumulo.core.client.summary.summarizers;
+
+import java.math.RoundingMode;
+
+import org.apache.accumulo.core.client.summary.Summarizer;
+import org.apache.accumulo.core.client.summary.SummarizerConfiguration;
+import org.apache.accumulo.core.data.Key;
+import org.apache.accumulo.core.data.Value;
+
+import com.google.common.math.IntMath;
+
+/**
+ * Summarizer that computes summary information about field lengths.
+ * Specifically key length, row length, family length, qualifier length, 
visibility length, and value length.
+ * Incrementally computes minimum, maximum, count, sum, and log2 histogram of 
the lengths.
+ */
+public class EntryLengthSummarizer implements Summarizer {
+
+  public static final String MIN_KEY_STAT = "minKey";
+  public static final String MAX_KEY_STAT = "maxKey";
+  public static final String SUM_KEYS_STAT = "sumKeys";
+
+  public static final String MIN_ROW_STAT = "minRow";
+  public static final String MAX_ROW_STAT = "maxRow";
+  public static final String SUM_ROWS_STAT = "sumRows";
+
+  public static final String MIN_FAMILY_STAT = "minFamily";
+  public static final String MAX_FAMILY_STAT = "maxFamily";
+  public static final String SUM_FAMILIES_STAT = "sumFamilies";
+
+  public static final String MIN_QUALIFIER_STAT = "minQualifier";
+  public static final String MAX_QUALIFIER_STAT = "maxQualifier";
+  public static final String SUM_QUALIFIERS_STAT = "sumQualifiers";
+
+  public static final String MIN_VISIBILITY_STAT = "minVisibility";
+  public static final String MAX_VISIBILITY_STAT = "maxVisibility";
+  public static final String SUM_VISIBILITIES_STAT = "sumVisibilities";
+
+  public static final String MIN_VALUE_STAT = "minValue";
+  public static final String MAX_VALUE_STAT = "maxValue";
+  public static final String SUM_VALUES_STAT = "sumValues";
+
+  public static final String TOTAL_STAT = "total";// Total number of Keys
+
 
 Review comment:
   Thanks I will implement this, definitely will make the code cleaner.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services