Re: [PR] Improved segment build time for Lucene text index realtime to offline conversion [pinot]

2024-04-15 Thread via GitHub


chenboat merged PR #12744:
URL: https://github.com/apache/pinot/pull/12744


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



Re: [PR] Improved segment build time for Lucene text index realtime to offline conversion [pinot]

2024-04-12 Thread via GitHub


chenboat commented on code in PR #12744:
URL: https://github.com/apache/pinot/pull/12744#discussion_r1563397223


##
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/realtime/impl/invertedindex/RealtimeLuceneTextIndex.java:
##
@@ -181,6 +181,17 @@ private MutableRoaringBitmap getPinotDocIds(IndexSearcher 
indexSearcher, Mutable
 return actualDocIDs;
   }
 
+  @Override
+  public void commit() {
+try {
+  _indexCreator.getIndexWriter().commit();
+} catch (Exception e) {
+  LOGGER.error("Failed to commit the realtime lucene text index for column 
{}, exception {}", _column,
+  e.getMessage());

Review Comment:
   I meant LOGGER.err("Failed.. .", _column, e);



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



Re: [PR] Improved segment build time for Lucene text index realtime to offline conversion [pinot]

2024-04-09 Thread via GitHub


itschrispeck commented on code in PR #12744:
URL: https://github.com/apache/pinot/pull/12744#discussion_r1558836542


##
pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/index/mutable/MutableIndex.java:
##
@@ -60,4 +60,12 @@ public interface MutableIndex extends IndexReader {
* @param docId The document id of the given row. A non-negative value.
*/
   void add(@Nonnull Object[] values, @Nullable int[] dictIds, int docId);
+
+  /**
+   * Commits the mutable index artifacts to disk. This is used in preparation 
for realtime segment conversion.
+   * commit() should perform any required actions before using mutable segment 
artifacts to optimize immutable

Review Comment:
   commit to disk should be the main mechanism for index reuse for any indexes 
using this path - disk is an intermediary. Other actions refers to actions a 
MutableIndex implementation might want to perform before writing some artifacts 
to disk



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



Re: [PR] Improved segment build time for Lucene text index realtime to offline conversion [pinot]

2024-04-09 Thread via GitHub


itschrispeck commented on code in PR #12744:
URL: https://github.com/apache/pinot/pull/12744#discussion_r1558835084


##
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/text/LuceneTextIndexCreator.java:
##
@@ -90,14 +102,17 @@ public static HashSet 
getDefaultEnglishStopWordsSet() {
*   to offline), we close this lucene index writer to release 
resources but don't commit.
* @param config the text index config
*/
-  public LuceneTextIndexCreator(String column, File segmentIndexDir, boolean 
commit, TextIndexConfig config) {
+  public LuceneTextIndexCreator(String column, File segmentIndexDir, boolean 
commit, boolean realtimeConversion,
+  @Nullable int[] immutableToMutableIdMap, TextIndexConfig config) {
 _textColumn = column;
+
+// to reuse the mutable index, it must be (1) not the realtime index, and 
(2) is realtime segment conversion

Review Comment:
   clarified in the code - `LuceneTextIndexCreator` is used for both in 
`RealtimeLuceneTextIndex` and separately for the offline lucene text index. We 
use the `commit` flag (set only for offline index) to avoid trying to convert 
an index if the instance is created by `RealtimeLuceneTextIndex`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



Re: [PR] Improved segment build time for Lucene text index realtime to offline conversion [pinot]

2024-04-09 Thread via GitHub


itschrispeck commented on code in PR #12744:
URL: https://github.com/apache/pinot/pull/12744#discussion_r1558833921


##
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/realtime/impl/invertedindex/RealtimeLuceneTextIndex.java:
##
@@ -181,6 +181,17 @@ private MutableRoaringBitmap getPinotDocIds(IndexSearcher 
indexSearcher, Mutable
 return actualDocIDs;
   }
 
+  @Override
+  public void commit() {
+try {
+  _indexCreator.getIndexWriter().commit();
+} catch (Exception e) {
+  LOGGER.error("Failed to commit the realtime lucene text index for column 
{}, exception {}", _column,
+  e.getMessage());

Review Comment:
   didn't quite follow this, can you elaborate?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



Re: [PR] Improved segment build time for Lucene text index realtime to offline conversion [pinot]

2024-04-09 Thread via GitHub


chenboat commented on code in PR #12744:
URL: https://github.com/apache/pinot/pull/12744#discussion_r1558152978


##
pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/index/mutable/MutableIndex.java:
##
@@ -60,4 +60,12 @@ public interface MutableIndex extends IndexReader {
* @param docId The document id of the given row. A non-negative value.
*/
   void add(@Nonnull Object[] values, @Nullable int[] dictIds, int docId);
+
+  /**
+   * Commits the mutable index artifacts to disk. This is used in preparation 
for realtime segment conversion.
+   * commit() should perform any required actions before using mutable segment 
artifacts to optimize immutable

Review Comment:
   Comments in line 65-66 seem to be self-conflicting: commit to disk is just 
one possible action? the implementation can actually choose other action for 
preparation. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



Re: [PR] Improved segment build time for Lucene text index realtime to offline conversion [pinot]

2024-04-09 Thread via GitHub


chenboat commented on code in PR #12744:
URL: https://github.com/apache/pinot/pull/12744#discussion_r1558146827


##
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/text/LuceneTextIndexCreator.java:
##
@@ -90,14 +102,17 @@ public static HashSet 
getDefaultEnglishStopWordsSet() {
*   to offline), we close this lucene index writer to release 
resources but don't commit.
* @param config the text index config
*/
-  public LuceneTextIndexCreator(String column, File segmentIndexDir, boolean 
commit, TextIndexConfig config) {
+  public LuceneTextIndexCreator(String column, File segmentIndexDir, boolean 
commit, boolean realtimeConversion,
+  @Nullable int[] immutableToMutableIdMap, TextIndexConfig config) {
 _textColumn = column;
+
+// to reuse the mutable index, it must be (1) not the realtime index, and 
(2) is realtime segment conversion

Review Comment:
   what does (1) mean exactly? "not the realtime index"? the code says commit?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



Re: [PR] Improved segment build time for Lucene text index realtime to offline conversion [pinot]

2024-04-09 Thread via GitHub


chenboat commented on code in PR #12744:
URL: https://github.com/apache/pinot/pull/12744#discussion_r1558134729


##
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/realtime/impl/invertedindex/RealtimeLuceneTextIndex.java:
##
@@ -181,6 +181,17 @@ private MutableRoaringBitmap getPinotDocIds(IndexSearcher 
indexSearcher, Mutable
 return actualDocIDs;
   }
 
+  @Override
+  public void commit() {
+try {
+  _indexCreator.getIndexWriter().commit();
+} catch (Exception e) {
+  LOGGER.error("Failed to commit the realtime lucene text index for column 
{}, exception {}", _column,
+  e.getMessage());

Review Comment:
   why not just log the error e?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



Re: [PR] Improved segment build time for Lucene text index realtime to offline conversion [pinot]

2024-04-09 Thread via GitHub


chenboat commented on code in PR #12744:
URL: https://github.com/apache/pinot/pull/12744#discussion_r1558089422


##
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/realtime/converter/RealtimeSegmentConverter.java:
##
@@ -118,6 +118,9 @@ public void build(@Nullable SegmentVersion segmentVersion, 
ServerMetrics serverM
 genConfig.setNullHandlingEnabled(_nullHandlingEnabled);
 genConfig.setSegmentZKPropsConfig(_segmentZKPropsConfig);
 
+// flush any artifacts to disk to improve immutable segment build

Review Comment:
   Flushing to disk is just one kind of commit action, right? See the comments 
added for the commit() call. I think in general all kind of preparation is 
possible?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



Re: [PR] Improved segment build time for Lucene text index realtime to offline conversion [pinot]

2024-04-09 Thread via GitHub


chenboat commented on code in PR #12744:
URL: https://github.com/apache/pinot/pull/12744#discussion_r1558086670


##
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/indexsegment/mutable/MutableSegmentImpl.java:
##
@@ -932,6 +932,19 @@ public Object getValue(int docId, String column) {
 }
   }
 
+  /**
+   * Calls commit() on all mutable indexes. This is used in preparation for 
realtime segment conversion.
+   * .commit() can be implemented per index to perform any required actions 
before using mutable segment
+   * artifacts to optimize imutable segment build.

Review Comment:
   imutable --> immutable



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



Re: [PR] Improved segment build time for Lucene text index realtime to offline conversion [pinot]

2024-04-04 Thread via GitHub


wirybeaver commented on code in PR #12744:
URL: https://github.com/apache/pinot/pull/12744#discussion_r1552866049


##
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/SegmentIndexCreationDriverImpl.java:
##
@@ -218,6 +224,19 @@ public void init(SegmentGeneratorConfig config, 
SegmentCreationDataSource dataSo
 LOGGER.debug("tempIndexDir:{}", _tempIndexDir);
   }
 
+  // input int[] can be used to map sortedDocIds[immutableId] = mutableId 
(based on RecordReader iteration order)
+  // output int[] should be used to map output[mutableId] = immutableId
+  private int[] convertSortedDocIds(@Nullable int[] sortedDocIds) {

Review Comment:
   IIUC, if the table have sort index, then the docIds need to be sorted 
accordingly when generate immutable segment?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



Re: [PR] Improved segment build time for Lucene text index realtime to offline conversion [pinot]

2024-04-04 Thread via GitHub


chenboat commented on code in PR #12744:
URL: https://github.com/apache/pinot/pull/12744#discussion_r1552581249


##
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/text/LuceneTextIndexCreator.java:
##
@@ -74,7 +85,8 @@ public static HashSet getDefaultEnglishStopWordsSet() 
{
* @param column column name
* @param segmentIndexDir segment index directory
* @param commit true if the index should be committed (at the end after all 
documents have
-   *   been added), false if index should not be committed
+   *   been added), false if index should not be
+   * @param sortedDocIds sortedDocIds from segment conversion

Review Comment:
   same as my earlier comment on the naming. Also since this is a public 
method, we need more explanation and some example.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



Re: [PR] Improved segment build time for Lucene text index realtime to offline conversion [pinot]

2024-04-04 Thread via GitHub


chenboat commented on code in PR #12744:
URL: https://github.com/apache/pinot/pull/12744#discussion_r1552580631


##
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/SegmentIndexCreationDriverImpl.java:
##
@@ -218,6 +224,19 @@ public void init(SegmentGeneratorConfig config, 
SegmentCreationDataSource dataSo
 LOGGER.debug("tempIndexDir:{}", _tempIndexDir);
   }
 
+  // input int[] can be used to map sortedDocIds[immutableId] = mutableId 
(based on RecordReader iteration order)
+  // output int[] should be used to map output[mutableId] = immutableId
+  private int[] convertSortedDocIds(@Nullable int[] sortedDocIds) {

Review Comment:
   sortedDocIds makes people think the docIds in the array is sorted. In this 
case it is not -- you probably mean immutableId is sorted as array index. Can 
we find a better name here with some explanation and example? e.g., 
immutableIdToMutableId map.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



Re: [PR] Improved segment build time for Lucene text index realtime to offline conversion [pinot]

2024-03-28 Thread via GitHub


codecov-commenter commented on PR #12744:
URL: https://github.com/apache/pinot/pull/12744#issuecomment-2026286837

   ## 
[Codecov](https://app.codecov.io/gh/apache/pinot/pull/12744?dropdown=coverage=pr=h1_medium=referral_source=github_content=comment_campaign=pr+comments_term=apache)
 Report
   Attention: Patch coverage is `0%` with `100 lines` in your changes are 
missing coverage. Please review.
   > Project coverage is 0.00%. Comparing base 
[(`59551e4`)](https://app.codecov.io/gh/apache/pinot/commit/59551e45224f1535c4863fd577622b37366ccc97?dropdown=coverage=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=apache)
 to head 
[(`6ba15ec`)](https://app.codecov.io/gh/apache/pinot/pull/12744?dropdown=coverage=pr=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=apache).
   > Report is 184 commits behind head on master.
   
   | 
[Files](https://app.codecov.io/gh/apache/pinot/pull/12744?dropdown=coverage=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=apache)
 | Patch % | Lines |
   |---|---|---|
   | 
[...ment/creator/impl/text/LuceneTextIndexCreator.java](https://app.codecov.io/gh/apache/pinot/pull/12744?src=pr=tree=pinot-segment-local%2Fsrc%2Fmain%2Fjava%2Forg%2Fapache%2Fpinot%2Fsegment%2Flocal%2Fsegment%2Fcreator%2Fimpl%2Ftext%2FLuceneTextIndexCreator.java_medium=referral_source=github_content=comment_campaign=pr+comments_term=apache#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2NyZWF0b3IvaW1wbC90ZXh0L0x1Y2VuZVRleHRJbmRleENyZWF0b3IuamF2YQ==)
 | 0.00% | [61 Missing :warning: 
](https://app.codecov.io/gh/apache/pinot/pull/12744?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=apache)
 |
   | 
[...t/creator/impl/SegmentIndexCreationDriverImpl.java](https://app.codecov.io/gh/apache/pinot/pull/12744?src=pr=tree=pinot-segment-local%2Fsrc%2Fmain%2Fjava%2Forg%2Fapache%2Fpinot%2Fsegment%2Flocal%2Fsegment%2Fcreator%2Fimpl%2FSegmentIndexCreationDriverImpl.java_medium=referral_source=github_content=comment_campaign=pr+comments_term=apache#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2NyZWF0b3IvaW1wbC9TZWdtZW50SW5kZXhDcmVhdGlvbkRyaXZlckltcGwuamF2YQ==)
 | 0.00% | [12 Missing :warning: 
](https://app.codecov.io/gh/apache/pinot/pull/12744?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=apache)
 |
   | 
[...inot/segment/spi/creator/IndexCreationContext.java](https://app.codecov.io/gh/apache/pinot/pull/12744?src=pr=tree=pinot-segment-spi%2Fsrc%2Fmain%2Fjava%2Forg%2Fapache%2Fpinot%2Fsegment%2Fspi%2Fcreator%2FIndexCreationContext.java_medium=referral_source=github_content=comment_campaign=pr+comments_term=apache#diff-cGlub3Qtc2VnbWVudC1zcGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL3Bpbm90L3NlZ21lbnQvc3BpL2NyZWF0b3IvSW5kZXhDcmVhdGlvbkNvbnRleHQuamF2YQ==)
 | 0.00% | [10 Missing :warning: 
](https://app.codecov.io/gh/apache/pinot/pull/12744?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=apache)
 |
   | 
[...me/impl/invertedindex/RealtimeLuceneTextIndex.java](https://app.codecov.io/gh/apache/pinot/pull/12744?src=pr=tree=pinot-segment-local%2Fsrc%2Fmain%2Fjava%2Forg%2Fapache%2Fpinot%2Fsegment%2Flocal%2Frealtime%2Fimpl%2Finvertedindex%2FRealtimeLuceneTextIndex.java_medium=referral_source=github_content=comment_campaign=pr+comments_term=apache#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9yZWFsdGltZS9pbXBsL2ludmVydGVkaW5kZXgvUmVhbHRpbWVMdWNlbmVUZXh0SW5kZXguamF2YQ==)
 | 0.00% | [7 Missing :warning: 
](https://app.codecov.io/gh/apache/pinot/pull/12744?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=apache)
 |
   | 
[...local/indexsegment/mutable/MutableSegmentImpl.java](https://app.codecov.io/gh/apache/pinot/pull/12744?src=pr=tree=pinot-segment-local%2Fsrc%2Fmain%2Fjava%2Forg%2Fapache%2Fpinot%2Fsegment%2Flocal%2Findexsegment%2Fmutable%2FMutableSegmentImpl.java_medium=referral_source=github_content=comment_campaign=pr+comments_term=apache#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9pbmRleHNlZ21lbnQvbXV0YWJsZS9NdXRhYmxlU2VnbWVudEltcGwuamF2YQ==)
 | 0.00% | [6 Missing :warning: 
](https://app.codecov.io/gh/apache/pinot/pull/12744?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=apache)
 |
   | 

[PR] Improved segment build time for Lucene text index realtime to offline conversion [pinot]

2024-03-28 Thread via GitHub


itschrispeck opened a new pull request, #12744:
URL: https://github.com/apache/pinot/pull/12744

   **Problem:**
   We typically see long (7-10min) segment build times when using Lucene index 
with 1-1.5GB segment sizes. 70-80% of this time is spent building the Lucene 
text index. 
   
   **Background:**
   In the existing implementation the Lucene index stores Pinot docIds: for the 
mutable segment these are the 'mutable' docIds, for the immutable segment we 
store each row with its new docId. Lucene queries return the matching Lucene 
DocIds, and we compute these on the fly for the mutable index, or from a 
mapping file for the immutable index. 
   
   **Change Summary:**
   This change copies the mutable Lucene index during realtime segment 
conversion to reuse, instead of building a new Lucene index. To handle the 
potential docId change `sortedDocIds` is added to `IndexCreationContext` to 
compute a temporary mapping between the mutable docId and the immutable 
segment's docId. This temporary mapping is used during segment conversion to 
build the mapping file between the Lucene docId and the new immutable segment's 
docId. This mapping file is built during segment conversion, instead of during 
segment load in the traditional path.
   
   Internally we've seen roughly 40-60% improvement in overall segment build 
time. The lower peaks are from a table/tenant with this change, the higher 
ingestion delay peaks are from an identical table in a tenant without this 
change:
   
   https://github.com/apache/pinot/assets/27231838/0ab23a4c-f7d3-4332-9c5b-e662925c6f9c;>
   
   Testing: deployed internally, local testing, validated basic 
pause/restart/reload operations on a table to ensure no regression in 
TextIndexHandler index build.
   
   tags: ingestion `performance`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org