[jira] [Created] (HBASE-27639) Support hbase-connectors compilation with HBase 2.5.3, Hadoop 3.2.4 and Spark 3.2.3
Nihal Jain created HBASE-27639: -- Summary: Support hbase-connectors compilation with HBase 2.5.3, Hadoop 3.2.4 and Spark 3.2.3 Key: HBASE-27639 URL: https://issues.apache.org/jira/browse/HBASE-27639 Project: HBase Issue Type: Improvement Reporter: Nihal Jain Assignee: Nihal Jain -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Reopened] (HBASE-26850) Optimize the implementation of LRUCache in LRUDictionary
[ https://issues.apache.org/jira/browse/HBASE-26850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] tianhang tang reopened HBASE-26850: --- > Optimize the implementation of LRUCache in LRUDictionary > > > Key: HBASE-26850 > URL: https://issues.apache.org/jira/browse/HBASE-26850 > Project: HBase > Issue Type: Improvement > Components: wal >Affects Versions: 1.7.1, 3.0.0-alpha-2, 2.4.11 >Reporter: tianhang tang >Assignee: tianhang tang >Priority: Major > Attachments: image-2022-03-16-17-13-00-871.png > > > *This issue is to unify the behavior of reading and writing links.* > > During my research on HBASE-26849, I found that there seems to be something > wrong with the implementation of LRUDictionary. > It uses array to save data, and uses hashMap to save array index. > For the write link: > {code:java} > CompressedKvEncoder#write > -> > LRUDictionary#findEntry > -> > LRUDictionary#addEntry > {code} > And for the read link: > {code:java} > CompressedKVDecoder#readIntoArray > -> > LRUDirectory#addEntry > {code} > Then we could see the logic in {_}findEntry{_}: > {code:java} > @Override > public short findEntry(byte[] data, int offset, int length) { > short ret = backingStore.findIdx(data, offset, length); > if (ret == NOT_IN_DICTIONARY) { > addEntry(data, offset, length); > } > return ret; > } > private short findIdx(byte[] array, int offset, int length) { > Short s; > final Node comparisonNode = new Node(); > comparisonNode.setContents(array, offset, length); > if ((s = nodeToIndex.get(comparisonNode)) != null) { > moveToHead(indexToNode[s]); > return s; > } else { > return -1; > } > } > {code} > At first it will try to find if there is same node in the cache, If there is, > use it directly. > The problem is, *if we have some same nodes, The behavior on the read and > write links are inconsistent:* > On the write link we will reuse just one node, but on the read link multiple > copies will be stored in the array. > but there is only one mapping from the node to the array index in the map. > So, on the read link, if the first 'same' node is evict, we will remove it > from the map, then we might met a NPE because when the second 'same' node > evict we can not find it in the map. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HBASE-27638) Get slow/large log response that matched the ‘CLIENT_IP' without client port
mokai created HBASE-27638: - Summary: Get slow/large log response that matched the ‘CLIENT_IP' without client port Key: HBASE-27638 URL: https://issues.apache.org/jira/browse/HBASE-27638 Project: HBase Issue Type: Improvement Affects Versions: 2.4.14 Reporter: mokai 'get_largelog_responses' and 'get_slowlog_responses' support filter the records with the given client by 'CLIENT_IP', but user has to provide the client ip and client port both. The client uses an ephemeral port mostly, there will be a better user experience if 'CLIENT_IP' without port is supported, then all the records matched the client ip will be returned. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HBASE-27637) Zero length value would cause value compressor read nothing nad not advance the position of the InputStream
Duo Zhang created HBASE-27637: - Summary: Zero length value would cause value compressor read nothing nad not advance the position of the InputStream Key: HBASE-27637 URL: https://issues.apache.org/jira/browse/HBASE-27637 Project: HBase Issue Type: Bug Components: dataloss, wal Reporter: Duo Zhang This is a code sniff from the discussion of HBASE-27073 {code} public static void main(String[] args) throws Exception { CompressionContext ctx = new CompressionContext(LRUDictionary.class, false, false, true, Compression.Algorithm.GZ); ValueCompressor compressor = ctx.getValueCompressor(); byte[] compressed = compressor.compress(new byte[0], 0, 0); System.out.println("compressed length: " + compressed.length); ByteArrayInputStream bis = new ByteArrayInputStream(compressed); int read = compressor.decompress(bis, compressed.length, new byte[0], 0, 0); System.out.println("read length: " + read); System.out.println("position: " + (compressed.length - bis.available())); {code} And the output is {noformat} compressed length: 20 read length: 0 position: 0 {noformat} So it turns out that, when compressing, an empty array will still generate some output bytes but while reading, we will skip reading anything if we find the output length is zero, so next time when we read from the stream, we will start at a wrong position... -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HBASE-27590) Change Iterable to List in SnapshotFileCache
[ https://issues.apache.org/jira/browse/HBASE-27590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Somogyi resolved HBASE-27590. --- Resolution: Fixed Cherry-picked to branch-2.4. > Change Iterable to List in SnapshotFileCache > > > Key: HBASE-27590 > URL: https://issues.apache.org/jira/browse/HBASE-27590 > Project: HBase > Issue Type: Improvement >Reporter: Peter Somogyi >Assignee: Peter Somogyi >Priority: Minor > Fix For: 2.6.0, 3.0.0-alpha-4, 2.5.4 > > Attachments: flame-1.html > > > The HFileCleaners can have low performance on large /archive area when used > with slow storage like S3. The snapshot write lock in SnapshotFileCache is > held while the file metadata is fetched from S3. Due to this even with > multiple cleaner threads only a single cleaner can effectively delete files > from the archive. > File metadata collection is performed before SnapshotHFileCleaner just by > changing the passed parameter type in FileCleanerDelegate from Iterable to > List. > Running with the below cleaner configurations I observed that the lock held > in SnapshotFileCache went down from 45000ms to 100ms when it was running for > 1000 files in a directory. The complete evaluation and deletion for this > folder took the same time but since the file metadata fetch from S3 was done > outside of the lock the multiple cleaner threads were able to run > concurrently. > {noformat} > hbase.cleaner.directory.sorting=false > hbase.cleaner.scan.dir.concurrent.size=0.75 > hbase.regionserver.hfilecleaner.small.thread.count=16 > hbase.regionserver.hfilecleaner.large.thread.count=8 > {noformat} > The files to evaluate are already passed in a List to > CleanerChore.checkAndDeleteFiles but it is converted to an Iterable to run > the checks on the configured cleaners. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Reopened] (HBASE-27590) Change Iterable to List in SnapshotFileCache
[ https://issues.apache.org/jira/browse/HBASE-27590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Somogyi reopened HBASE-27590: --- > Change Iterable to List in SnapshotFileCache > > > Key: HBASE-27590 > URL: https://issues.apache.org/jira/browse/HBASE-27590 > Project: HBase > Issue Type: Improvement >Reporter: Peter Somogyi >Assignee: Peter Somogyi >Priority: Minor > Fix For: 2.6.0, 3.0.0-alpha-4, 2.5.4 > > Attachments: flame-1.html > > > The HFileCleaners can have low performance on large /archive area when used > with slow storage like S3. The snapshot write lock in SnapshotFileCache is > held while the file metadata is fetched from S3. Due to this even with > multiple cleaner threads only a single cleaner can effectively delete files > from the archive. > File metadata collection is performed before SnapshotHFileCleaner just by > changing the passed parameter type in FileCleanerDelegate from Iterable to > List. > Running with the below cleaner configurations I observed that the lock held > in SnapshotFileCache went down from 45000ms to 100ms when it was running for > 1000 files in a directory. The complete evaluation and deletion for this > folder took the same time but since the file metadata fetch from S3 was done > outside of the lock the multiple cleaner threads were able to run > concurrently. > {noformat} > hbase.cleaner.directory.sorting=false > hbase.cleaner.scan.dir.concurrent.size=0.75 > hbase.regionserver.hfilecleaner.small.thread.count=16 > hbase.regionserver.hfilecleaner.large.thread.count=8 > {noformat} > The files to evaluate are already passed in a List to > CleanerChore.checkAndDeleteFiles but it is converted to an Iterable to run > the checks on the configured cleaners. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HBASE-27629) Backport HBASE-27043 to branch-2.4
[ https://issues.apache.org/jira/browse/HBASE-27629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Somogyi resolved HBASE-27629. --- Fix Version/s: 2.4.17 Resolution: Fixed Merged to branch-2.4. > Backport HBASE-27043 to branch-2.4 > -- > > Key: HBASE-27629 > URL: https://issues.apache.org/jira/browse/HBASE-27629 > Project: HBase > Issue Type: Sub-task >Reporter: Peter Somogyi >Assignee: Peter Somogyi >Priority: Major > Fix For: 2.4.17 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HBASE-27636) The value of region metric--lastMajorCompactionAge is wrong,when the hfile is buckload file
selina.yan created HBASE-27636: -- Summary: The value of region metric--lastMajorCompactionAge is wrong,when the hfile is buckload file Key: HBASE-27636 URL: https://issues.apache.org/jira/browse/HBASE-27636 Project: HBase Issue Type: Bug Components: hbase-connectors Reporter: selina.yan Assignee: selina.yan When HFileOutputFormat2 is used to create an hfile, the fileCreateTime will not be assigned when creating an hfileContext, resulting in the lastMajorCompactionAge indicator of the region being the timestamp of the current time after buckload to hbase. {code:java} ##HFileOutputFormat2.class HFileContextBuilder contextBuilder = new HFileContextBuilder() .withCompression(compression) .withChecksumType(HStore.getChecksumType(conf)) .withBytesPerCheckSum(HStore.getBytesPerChecksum(conf)) .withBlockSize(blockSize); if (HFile.getFormatVersion(conf) >= HFile.MIN_FORMAT_VERSION_WITH_TAGS) { contextBuilder.withIncludesTags(true); } contextBuilder.withDataBlockEncoding(encoding); HFileContext hFileContext = contextBuilder.build(); ##get lastMajorCompactionTs metric long lastMajorCompactionTs = 0L; try { lastMajorCompactionTs = this.region.getOldestHfileTs(true); } catch (IOException ioe) { LOG.error("Could not load HFile info ", ioe); } long now = EnvironmentEdgeManager.currentTime(); return now - lastMajorCompactionTs; } ... ## public long getOldestHfileTs(boolean majorCompactionOnly) throws IOException { long result = Long.MAX_VALUE; for (HStore store : stores.values()) { Collection storeFiles = store.getStorefiles(); ... for (HStoreFile file : storeFiles) { StoreFileReader sfReader = file.getReader(); ... result = Math.min(result, reader.getFileContext().getFileCreateTime()); } } return result == Long.MAX_VALUE ? 0 : result; }{code} -- This message was sent by Atlassian Jira (v8.20.10#820010)