[jira] [Created] (HBASE-27639) Support hbase-connectors compilation with HBase 2.5.3, Hadoop 3.2.4 and Spark 3.2.3

2023-02-13 Thread Nihal Jain (Jira)
Nihal Jain created HBASE-27639:
--

 Summary: Support hbase-connectors compilation with HBase 2.5.3, 
Hadoop 3.2.4 and Spark 3.2.3
 Key: HBASE-27639
 URL: https://issues.apache.org/jira/browse/HBASE-27639
 Project: HBase
  Issue Type: Improvement
Reporter: Nihal Jain
Assignee: Nihal Jain






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Reopened] (HBASE-26850) Optimize the implementation of LRUCache in LRUDictionary

2023-02-13 Thread tianhang tang (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-26850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

tianhang tang reopened HBASE-26850:
---

> Optimize the implementation of LRUCache in LRUDictionary
> 
>
> Key: HBASE-26850
> URL: https://issues.apache.org/jira/browse/HBASE-26850
> Project: HBase
>  Issue Type: Improvement
>  Components: wal
>Affects Versions: 1.7.1, 3.0.0-alpha-2, 2.4.11
>Reporter: tianhang tang
>Assignee: tianhang tang
>Priority: Major
> Attachments: image-2022-03-16-17-13-00-871.png
>
>
> *This issue is to unify the behavior of reading and writing links.*
>  
> During my research on HBASE-26849, I found that there seems to be something 
> wrong with the implementation of LRUDictionary.
> It uses array to save data, and uses hashMap to save array index.
> For the write link:
> {code:java}
> CompressedKvEncoder#write
> ->
> LRUDictionary#findEntry
> ->
> LRUDictionary#addEntry
> {code}
> And for the read link:
> {code:java}
> CompressedKVDecoder#readIntoArray
> ->
> LRUDirectory#addEntry
> {code}
> Then we could see the logic in {_}findEntry{_}:
> {code:java}
>   @Override
>   public short findEntry(byte[] data, int offset, int length) {
> short ret = backingStore.findIdx(data, offset, length);
> if (ret == NOT_IN_DICTIONARY) {
>   addEntry(data, offset, length);
> }
> return ret;
>   }
> private short findIdx(byte[] array, int offset, int length) {
>   Short s;
>   final Node comparisonNode = new Node();
>   comparisonNode.setContents(array, offset, length);
>   if ((s = nodeToIndex.get(comparisonNode)) != null) {
> moveToHead(indexToNode[s]);
> return s;
>   } else {
> return -1;
>   }
> }
> {code}
> At first it will try to find if there is same node in the cache, If there is, 
> use it directly.
> The problem is, *if we have some same nodes, The behavior on the read and 
> write links are inconsistent:*
> On the write link we will reuse just one node, but on the read link multiple 
> copies will be stored in the array.
> but there is only one mapping from the node to the array index in the map. 
> So, on the read link, if the first 'same' node is evict, we will remove it 
> from the map, then we might met a NPE because when the second 'same' node 
> evict we can not find it in the map.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-27638) Get slow/large log response that matched the ‘CLIENT_IP' without client port

2023-02-13 Thread mokai (Jira)
mokai created HBASE-27638:
-

 Summary: Get slow/large log response that matched the ‘CLIENT_IP' 
without client port
 Key: HBASE-27638
 URL: https://issues.apache.org/jira/browse/HBASE-27638
 Project: HBase
  Issue Type: Improvement
Affects Versions: 2.4.14
Reporter: mokai


'get_largelog_responses' and 'get_slowlog_responses' support filter the records 
with the given client by 'CLIENT_IP', but user has to provide the client ip and 
client port both. The client uses an ephemeral port mostly, there will be a 
better user experience if 'CLIENT_IP' without port is supported, then all the 
records matched the client ip will be returned.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-27637) Zero length value would cause value compressor read nothing nad not advance the position of the InputStream

2023-02-13 Thread Duo Zhang (Jira)
Duo Zhang created HBASE-27637:
-

 Summary: Zero length value would cause value compressor read 
nothing nad not advance the position of the InputStream
 Key: HBASE-27637
 URL: https://issues.apache.org/jira/browse/HBASE-27637
 Project: HBase
  Issue Type: Bug
  Components: dataloss, wal
Reporter: Duo Zhang


This is a code sniff from the discussion of HBASE-27073

{code}
  public static void main(String[] args) throws Exception {
CompressionContext ctx =
  new CompressionContext(LRUDictionary.class, false, false, true, 
Compression.Algorithm.GZ);
ValueCompressor compressor = ctx.getValueCompressor();
byte[] compressed = compressor.compress(new byte[0], 0, 0);
System.out.println("compressed length: " + compressed.length);
ByteArrayInputStream bis = new ByteArrayInputStream(compressed);
int read = compressor.decompress(bis, compressed.length, new byte[0], 0, 0);
System.out.println("read length: " + read);
System.out.println("position: " + (compressed.length - bis.available()));
{code}

And the output is
{noformat}
compressed length: 20
read length: 0
position: 0
{noformat}

So it turns out that, when compressing, an empty array will still generate some 
output bytes but while reading, we will skip reading anything if we find the 
output length is zero, so next time when we read from the stream, we will start 
at a wrong position...



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HBASE-27590) Change Iterable to List in SnapshotFileCache

2023-02-13 Thread Peter Somogyi (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-27590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Somogyi resolved HBASE-27590.
---
Resolution: Fixed

Cherry-picked to branch-2.4.

> Change Iterable to List in SnapshotFileCache
> 
>
> Key: HBASE-27590
> URL: https://issues.apache.org/jira/browse/HBASE-27590
> Project: HBase
>  Issue Type: Improvement
>Reporter: Peter Somogyi
>Assignee: Peter Somogyi
>Priority: Minor
> Fix For: 2.6.0, 3.0.0-alpha-4, 2.5.4
>
> Attachments: flame-1.html
>
>
> The HFileCleaners can have low performance on large /archive area when used 
> with slow storage like S3. The snapshot write lock in SnapshotFileCache is 
> held while the file metadata is fetched from S3. Due to this even with 
> multiple cleaner threads only a single cleaner can effectively delete files 
> from the archive.
> File metadata collection is performed before SnapshotHFileCleaner just by 
> changing the passed parameter type in FileCleanerDelegate from Iterable to 
> List.
> Running with the below cleaner configurations I observed that the lock held 
> in SnapshotFileCache went down from 45000ms to 100ms  when it was running for 
> 1000 files in a directory. The complete evaluation and deletion for this 
> folder took the same time but since the file metadata fetch from S3 was done 
> outside of the lock the multiple cleaner threads were able to run 
> concurrently.
> {noformat}
> hbase.cleaner.directory.sorting=false
> hbase.cleaner.scan.dir.concurrent.size=0.75
> hbase.regionserver.hfilecleaner.small.thread.count=16
> hbase.regionserver.hfilecleaner.large.thread.count=8
> {noformat}
> The files to evaluate are already passed in a List to 
> CleanerChore.checkAndDeleteFiles but it is converted to an Iterable to run 
> the checks on the configured cleaners.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Reopened] (HBASE-27590) Change Iterable to List in SnapshotFileCache

2023-02-13 Thread Peter Somogyi (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-27590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Somogyi reopened HBASE-27590:
---

> Change Iterable to List in SnapshotFileCache
> 
>
> Key: HBASE-27590
> URL: https://issues.apache.org/jira/browse/HBASE-27590
> Project: HBase
>  Issue Type: Improvement
>Reporter: Peter Somogyi
>Assignee: Peter Somogyi
>Priority: Minor
> Fix For: 2.6.0, 3.0.0-alpha-4, 2.5.4
>
> Attachments: flame-1.html
>
>
> The HFileCleaners can have low performance on large /archive area when used 
> with slow storage like S3. The snapshot write lock in SnapshotFileCache is 
> held while the file metadata is fetched from S3. Due to this even with 
> multiple cleaner threads only a single cleaner can effectively delete files 
> from the archive.
> File metadata collection is performed before SnapshotHFileCleaner just by 
> changing the passed parameter type in FileCleanerDelegate from Iterable to 
> List.
> Running with the below cleaner configurations I observed that the lock held 
> in SnapshotFileCache went down from 45000ms to 100ms  when it was running for 
> 1000 files in a directory. The complete evaluation and deletion for this 
> folder took the same time but since the file metadata fetch from S3 was done 
> outside of the lock the multiple cleaner threads were able to run 
> concurrently.
> {noformat}
> hbase.cleaner.directory.sorting=false
> hbase.cleaner.scan.dir.concurrent.size=0.75
> hbase.regionserver.hfilecleaner.small.thread.count=16
> hbase.regionserver.hfilecleaner.large.thread.count=8
> {noformat}
> The files to evaluate are already passed in a List to 
> CleanerChore.checkAndDeleteFiles but it is converted to an Iterable to run 
> the checks on the configured cleaners.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HBASE-27629) Backport HBASE-27043 to branch-2.4

2023-02-13 Thread Peter Somogyi (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-27629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Somogyi resolved HBASE-27629.
---
Fix Version/s: 2.4.17
   Resolution: Fixed

Merged to branch-2.4.

> Backport HBASE-27043 to branch-2.4
> --
>
> Key: HBASE-27629
> URL: https://issues.apache.org/jira/browse/HBASE-27629
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Peter Somogyi
>Assignee: Peter Somogyi
>Priority: Major
> Fix For: 2.4.17
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-27636) The value of region metric--lastMajorCompactionAge is wrong,when the hfile is buckload file

2023-02-13 Thread selina.yan (Jira)
selina.yan created HBASE-27636:
--

 Summary: The value of region  metric--lastMajorCompactionAge is 
wrong,when the hfile is buckload file
 Key: HBASE-27636
 URL: https://issues.apache.org/jira/browse/HBASE-27636
 Project: HBase
  Issue Type: Bug
  Components: hbase-connectors
Reporter: selina.yan
Assignee: selina.yan


When HFileOutputFormat2 is used to create an hfile, the fileCreateTime will not 
be assigned when creating an hfileContext, resulting in the 
lastMajorCompactionAge indicator of the region being the timestamp of the 
current time after buckload to hbase.

 
{code:java}
##HFileOutputFormat2.class
HFileContextBuilder contextBuilder = new HFileContextBuilder()
.withCompression(compression)
.withChecksumType(HStore.getChecksumType(conf))

.withBytesPerCheckSum(HStore.getBytesPerChecksum(conf))
.withBlockSize(blockSize);

if (HFile.getFormatVersion(conf) >= HFile.MIN_FORMAT_VERSION_WITH_TAGS) {
  contextBuilder.withIncludesTags(true);
}

contextBuilder.withDataBlockEncoding(encoding);
HFileContext hFileContext = contextBuilder.build(); 

##get lastMajorCompactionTs metric

  long lastMajorCompactionTs = 0L;
  try {
lastMajorCompactionTs = this.region.getOldestHfileTs(true);
  } catch (IOException ioe) {
LOG.error("Could not load HFile info ", ioe);
  }
  long now = EnvironmentEdgeManager.currentTime();
  return now - lastMajorCompactionTs;
}

...
##
public long getOldestHfileTs(boolean majorCompactionOnly) throws IOException {
  long result = Long.MAX_VALUE;
  for (HStore store : stores.values()) {
Collection storeFiles = store.getStorefiles();
   ...
for (HStoreFile file : storeFiles) {
  StoreFileReader sfReader = file.getReader();
 ...
  result = Math.min(result, reader.getFileContext().getFileCreateTime());
}
  }
  return result == Long.MAX_VALUE ? 0 : result;
}{code}
 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)