[jira] [Commented] (ACCUMULO-4669) RFile can create very large blocks when key statistics are not uniform
[ https://issues.apache.org/jira/browse/ACCUMULO-4669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16069323#comment-16069323 ] Christopher Tubbs commented on ACCUMULO-4669: - No, it wouldn't prevent them from getting into the index entirely... but it would allow a best effort within some reasonable distance of the user-configured block size (after the soft limit, before the hard limit). > RFile can create very large blocks when key statistics are not uniform > -- > > Key: ACCUMULO-4669 > URL: https://issues.apache.org/jira/browse/ACCUMULO-4669 > Project: Accumulo > Issue Type: Bug > Components: core >Affects Versions: 1.7.2, 1.7.3, 1.8.0, 1.8.1 >Reporter: Adam Fuchs >Assignee: Keith Turner >Priority: Blocker > Fix For: 1.7.4, 1.8.2, 2.0.0 > > > RFile.Writer.append checks for giant keys and avoid writing them as index > blocks. This check is flawed and can result in multi-GB blocks. In our case, > a 20GB compressed RFile had one block with over 2GB raw size. This happened > because the key size statistics changed after some point in the file. The > code in question follows: > {code} > private boolean isGiantKey(Key k) { > // consider a key thats more than 3 standard deviations from previously > seen key sizes as giant > return k.getSize() > keyLenStats.getMean() + > keyLenStats.getStandardDeviation() * 3; > } > ... > if (blockWriter == null) { > blockWriter = fileWriter.prepareDataBlock(); > } else if (blockWriter.getRawSize() > blockSize) { > ... > if ((prevKey.getSize() <= avergageKeySize || blockWriter.getRawSize() > > maxBlockSize) && !isGiantKey(prevKey)) { > closeBlock(prevKey, false); > ... > {code} > Before closing a block that has grown beyond the target block size we check > to see that the key is below average in size or that the block is 1.1 times > the target block size (maxBlockSize), and we check that the key isn't a > "giant" key, or more than 3 standard deviations from the mean of keys seen so > far. > Our RFiles often have one row of data with different column families > representing various forward and inverted indexes. This is a table design > similar to the WikiSearch example. The first column family in this case had > very uniform, relatively small key sizes. This first column family comprised > gigabytes of data, split up into roughly 100KB blocks. When we switched to > the next column family the keys grew in size, but were still under about 100 > bytes. The statistics of the first column family had firmly established a > smaller mean and tiny standard deviation (approximately 0), and it took over > 2GB of larger keys to bring the standard deviation up enough so that keys > were no longer considered "giant" and the block could be closed. > Now that we're aware, we see large blocks (more than 10x the target block > size) in almost every RFile we write. This only became a glaring problem when > we got OOM exceptions trying to decompress the block, but it also shows up in > a number of subtle performance problems, like high variance in latencies for > looking up particular keys. > The fix for this should produce bounded RFile block sizes, limited to the > greater of 2x the maximum key/value size in the block and some configurable > threshold, such as 1.1 times the compressed block size. We need a firm cap to > be able to reason about memory usage in various applications. > The following code produces arbitrarily large RFile blocks: > {code} > FileSKVWriter writer = RFileOperations.getInstance().openWriter(filename, > fs, conf, acuconf); > writer.startDefaultLocalityGroup(); > SummaryStatistics keyLenStats = new SummaryStatistics(); > Random r = new Random(); > byte [] buffer = new byte[minRowSize]; > for(int i = 0; i < 10; i++) { > byte [] valBytes = new byte[valLength]; > r.nextBytes(valBytes); > r.nextBytes(buffer); > ByteBuffer.wrap(buffer).putInt(i); > Key k = new Key(buffer, 0, buffer.length, emptyBytes, 0, 0, emptyBytes, > 0, 0, emptyBytes, 0, 0, 0); > Value v = new Value(valBytes); > writer.append(k, v); > keyLenStats.addValue(k.getSize()); > int newBufferSize = Math.max(buffer.length, (int) > Math.ceil(keyLenStats.getMean() + keyLenStats.getStandardDeviation() * 4 + > 0.0001)); > buffer = new byte[newBufferSize]; > if(keyLenStats.getSum() > targetSize) > break; > } > writer.close(); > {code} > One telltale symptom of this bug is an OutOfMemoryException thrown from a > readahead thread with message "Requested array size exceeds VM limit". This > will only happen if the block cache size is big enough to hold the expected > raw block size, 2GB in our
[jira] [Commented] (ACCUMULO-4669) RFile can create very large blocks when key statistics are not uniform
[ https://issues.apache.org/jira/browse/ACCUMULO-4669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16069287#comment-16069287 ] Keith Turner commented on ACCUMULO-4669: {quote} I think the isGiant heuristic is worth keeping, but we should reintroduce a hard limit on the data block size. {quote} The goal of the isGiant check was to prevent really large (megabyte size) keys from making it into the index. I was dealing with keys that followed a zipfian distribution. The hard limit would not prevent these keys from making it into the index. When these keys get into the index, it can be a disaster because they may be pushed up the index tree. Maybe the isGiant check should be dropped if it can't accomplish its intended goal without making large data blocks. > RFile can create very large blocks when key statistics are not uniform > -- > > Key: ACCUMULO-4669 > URL: https://issues.apache.org/jira/browse/ACCUMULO-4669 > Project: Accumulo > Issue Type: Bug > Components: core >Affects Versions: 1.7.2, 1.7.3, 1.8.0, 1.8.1 >Reporter: Adam Fuchs >Assignee: Keith Turner >Priority: Blocker > Fix For: 1.7.4, 1.8.2, 2.0.0 > > > RFile.Writer.append checks for giant keys and avoid writing them as index > blocks. This check is flawed and can result in multi-GB blocks. In our case, > a 20GB compressed RFile had one block with over 2GB raw size. This happened > because the key size statistics changed after some point in the file. The > code in question follows: > {code} > private boolean isGiantKey(Key k) { > // consider a key thats more than 3 standard deviations from previously > seen key sizes as giant > return k.getSize() > keyLenStats.getMean() + > keyLenStats.getStandardDeviation() * 3; > } > ... > if (blockWriter == null) { > blockWriter = fileWriter.prepareDataBlock(); > } else if (blockWriter.getRawSize() > blockSize) { > ... > if ((prevKey.getSize() <= avergageKeySize || blockWriter.getRawSize() > > maxBlockSize) && !isGiantKey(prevKey)) { > closeBlock(prevKey, false); > ... > {code} > Before closing a block that has grown beyond the target block size we check > to see that the key is below average in size or that the block is 1.1 times > the target block size (maxBlockSize), and we check that the key isn't a > "giant" key, or more than 3 standard deviations from the mean of keys seen so > far. > Our RFiles often have one row of data with different column families > representing various forward and inverted indexes. This is a table design > similar to the WikiSearch example. The first column family in this case had > very uniform, relatively small key sizes. This first column family comprised > gigabytes of data, split up into roughly 100KB blocks. When we switched to > the next column family the keys grew in size, but were still under about 100 > bytes. The statistics of the first column family had firmly established a > smaller mean and tiny standard deviation (approximately 0), and it took over > 2GB of larger keys to bring the standard deviation up enough so that keys > were no longer considered "giant" and the block could be closed. > Now that we're aware, we see large blocks (more than 10x the target block > size) in almost every RFile we write. This only became a glaring problem when > we got OOM exceptions trying to decompress the block, but it also shows up in > a number of subtle performance problems, like high variance in latencies for > looking up particular keys. > The fix for this should produce bounded RFile block sizes, limited to the > greater of 2x the maximum key/value size in the block and some configurable > threshold, such as 1.1 times the compressed block size. We need a firm cap to > be able to reason about memory usage in various applications. > The following code produces arbitrarily large RFile blocks: > {code} > FileSKVWriter writer = RFileOperations.getInstance().openWriter(filename, > fs, conf, acuconf); > writer.startDefaultLocalityGroup(); > SummaryStatistics keyLenStats = new SummaryStatistics(); > Random r = new Random(); > byte [] buffer = new byte[minRowSize]; > for(int i = 0; i < 10; i++) { > byte [] valBytes = new byte[valLength]; > r.nextBytes(valBytes); > r.nextBytes(buffer); > ByteBuffer.wrap(buffer).putInt(i); > Key k = new Key(buffer, 0, buffer.length, emptyBytes, 0, 0, emptyBytes, > 0, 0, emptyBytes, 0, 0, 0); > Value v = new Value(valBytes); > writer.append(k, v); > keyLenStats.addValue(k.getSize()); > int newBufferSize = Math.max(buffer.length, (int) > Math.ceil(keyLenStats.getMean() + keyLenStats.getStandardDeviation() * 4 + > 0.0001)); > buffer = new
[jira] [Updated] (ACCUMULO-4672) NPE extracting samplerConfiguration from InputSplit
[ https://issues.apache.org/jira/browse/ACCUMULO-4672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Christopher Tubbs updated ACCUMULO-4672: Fix Version/s: 2.0.0 > NPE extracting samplerConfiguration from InputSplit > --- > > Key: ACCUMULO-4672 > URL: https://issues.apache.org/jira/browse/ACCUMULO-4672 > Project: Accumulo > Issue Type: Bug > Components: mapreduce >Reporter: Josh Elser >Assignee: Josh Elser >Priority: Minor > Fix For: 1.8.2, 2.0.0 > > > {noformat} > Caused by: java.lang.NullPointerException > at > org.apache.accumulo.core.client.mapred.AbstractInputFormat$AbstractRecordReader.initialize(AbstractInputFormat.java:608) > at > org.apache.accumulo.core.client.mapred.AccumuloRowInputFormat$1.initialize(AccumuloRowInputFormat.java:60) > at > org.apache.accumulo.core.client.mapred.AccumuloRowInputFormat.getRecordReader(AccumuloRowInputFormat.java:84) > {noformat} > I still need to dig into this one and try to write a test case for it that > doesn't involve Hive (as it may have just been something that I was doing). > Best as I can tell.. > AbstractInputFormat extracts a default table configuration object from the > Job's Configuration class: > {code} > InputTableConfig tableConfig = getInputTableConfig(job, > baseSplit.getTableName()); > {code} > Eventually, the same class tries to extract the samplerConfiguration from > this tableConfig (after noticing it is not present in the InputSplit) and > this throws an NPE. Somehow the tableConfig was null. It very well could be > that Hive was to blame, I just wanted to make sure that this was captured > before I forgot about it. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (ACCUMULO-4043) start-up needs to scale with the number of servers
[ https://issues.apache.org/jira/browse/ACCUMULO-4043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Bella reassigned ACCUMULO-4043: Assignee: (was: Ivan Bella) > start-up needs to scale with the number of servers > -- > > Key: ACCUMULO-4043 > URL: https://issues.apache.org/jira/browse/ACCUMULO-4043 > Project: Accumulo > Issue Type: Bug > Components: master >Affects Versions: 1.6.4 > Environment: very large production cluster >Reporter: Eric Newton > Fix For: 2.0.0 > > > When starting a very large production cluster, the master gets running fast > enough that it loads the metadata table before all the tservers are > registered and running. The result is a very long balancing period after all > servers started. The wait period for the tablet server stabilization needs to > scale somewhat with the number of tablet servers. -- This message was sent by Atlassian JIRA (v6.4.14#64029)