steffenvan commented on code in PR #1629: URL: https://github.com/apache/jackrabbit-oak/pull/1629#discussion_r1709498634
########## oak-search/src/main/java/org/apache/jackrabbit/oak/plugins/index/search/spi/binary/TextExtractionStats.java: ########## @@ -53,25 +65,49 @@ public void log(boolean reindex) { } } - public void collectStats(ExtractedTextCache cache){ - cache.addStats(count, totalTime, totalBytesRead, totalTextLength); + public long finishExtraction(long bytesRead, int extractedTextLength) { + long elapsedNanos = System.nanoTime() - currentExtractionStartNanos; + numberOfExtractions++; + totalBytesRead += bytesRead; + totalExtractedTextLength += extractedTextLength; + totalExtractionTimeNanos += elapsedNanos; + return elapsedNanos/1_000_000; + } + + public void collectStats(ExtractedTextCache cache) { + cache.addStats(numberOfExtractions, totalExtractionTimeNanos/1_000_000, totalBytesRead, totalExtractedTextLength); } private boolean isTakingLotsOfTime() { - return totalTime > LOGGING_THRESHOLD; + return totalExtractionTimeNanos > LOGGING_THRESHOLD*1_000_000; Review Comment: v ########## oak-search/src/main/java/org/apache/jackrabbit/oak/plugins/index/search/spi/binary/TextExtractionStats.java: ########## @@ -53,25 +65,49 @@ public void log(boolean reindex) { } } - public void collectStats(ExtractedTextCache cache){ - cache.addStats(count, totalTime, totalBytesRead, totalTextLength); + public long finishExtraction(long bytesRead, int extractedTextLength) { + long elapsedNanos = System.nanoTime() - currentExtractionStartNanos; + numberOfExtractions++; + totalBytesRead += bytesRead; + totalExtractedTextLength += extractedTextLength; + totalExtractionTimeNanos += elapsedNanos; + return elapsedNanos/1_000_000; + } + + public void collectStats(ExtractedTextCache cache) { + cache.addStats(numberOfExtractions, totalExtractionTimeNanos/1_000_000, totalBytesRead, totalExtractedTextLength); } private boolean isTakingLotsOfTime() { - return totalTime > LOGGING_THRESHOLD; + return totalExtractionTimeNanos > LOGGING_THRESHOLD*1_000_000; Review Comment: Minor formatting related thing but should we have spaces between binary infix operators? Like `LOGGING_THRESHOLD * 1_000_000`? At least we have that in some places - so it would be nice to be consistent with that. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: oak-dev-unsubscr...@jackrabbit.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org