Re: [PR] Avoid Inconsistencies among replicas during Upsert Compaction Tasks [pinot]

2026-03-06 Thread via GitHub


Jackie-Jiang merged PR #17696:
URL: https://github.com/apache/pinot/pull/17696


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Avoid Inconsistencies among replicas during Upsert Compaction Tasks [pinot]

2026-03-06 Thread via GitHub


Jackie-Jiang commented on code in PR #17696:
URL: https://github.com/apache/pinot/pull/17696#discussion_r2894418403


##
pinot-core/src/main/java/org/apache/pinot/core/common/MinionConstants.java:
##
@@ -257,6 +262,16 @@ public static class UpsertCompactionTask {
  * number of segments to query in one batch to fetch valid doc id 
metadata, by default 500
  */
 public static final String NUM_SEGMENTS_BATCH_PER_SERVER_REQUEST = 
"numSegmentsBatchPerServerRequest";
+
+/**
+ * Valid doc ids comparison mode used by the executor only (generator 
unchanged). Values: UNSAFE, EQUAL,
+ * MOST_VALID_DOCS. UNSAFE = use first server with matching CRC and READY; 
EQUAL = require all replicas
+ * to have the same valid doc set (default); MOST_VALID_DOCS = use replica 
with most valid docs.
+ */
+public static final String VALID_DOC_IDS_COMPARISON_MODE_KEY = 
"validDocIdsComparisonMode";

Review Comment:
   Make the key consistent with enum: `validDocIdsConsensusMode`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Avoid Inconsistencies among replicas during Upsert Compaction Tasks [pinot]

2026-03-04 Thread via GitHub


deepthi912 commented on PR #17696:
URL: https://github.com/apache/pinot/pull/17696#issuecomment-4001154888

   > We can consider adding a flag to also run the check on the generator side
   
   Will add a followup PR for this


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Avoid Inconsistencies among replicas during Upsert Compaction Tasks [pinot]

2026-03-04 Thread via GitHub


deepthi912 commented on code in PR #17696:
URL: https://github.com/apache/pinot/pull/17696#discussion_r2886532554


##
pinot-plugins/pinot-minion-tasks/pinot-minion-builtin-tasks/src/main/java/org/apache/pinot/plugin/minion/tasks/MinionTaskUtils.java:
##
@@ -67,6 +66,21 @@
 public class MinionTaskUtils {
   private static final Logger LOGGER = 
LoggerFactory.getLogger(MinionTaskUtils.class);
 
+  /** Valid doc ids comparison mode (executor-only). Kept internal; executors 
pass config string. */
+  enum ValidDocIdsComparisonMode {

Review Comment:
   I made it EQUAL instead of SAME



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Avoid Inconsistencies among replicas during Upsert Compaction Tasks [pinot]

2026-03-04 Thread via GitHub


deepthi912 commented on code in PR #17696:
URL: https://github.com/apache/pinot/pull/17696#discussion_r2886118584


##
pinot-plugins/pinot-minion-tasks/pinot-minion-builtin-tasks/src/main/java/org/apache/pinot/plugin/minion/tasks/MinionTaskUtils.java:
##
@@ -281,66 +295,111 @@ public static boolean 
extractMinionAllowDownloadFromServer(TableConfig tableConf
   }
 
   /**
-   * Returns the validDocID bitmap from the server whose local segment crc 
matches both crc of ZK metadata and
-   * deepstore copy (expectedCrc).
+   * Returns the validDocIds bitmap from server(s). {@code comparisonMode} is 
the task config value: NONE,
+   * EQUAL_CONSENSUS(default), or MAX_VALID_DOCS.
*/
   @Nullable
   public static RoaringBitmap getValidDocIdFromServerMatchingCrc(String 
tableNameWithType, String segmentName,
-  String validDocIdsType, MinionContext minionContext, String expectedCrc) 
{
+  String validDocIdsType, MinionContext minionContext, String expectedCrc, 
String comparisonModeStr) {
+ValidDocIdsComparisonMode comparisonMode = 
parseValidDocIdsComparisonMode(comparisonModeStr);
 String clusterName = minionContext.getHelixManager().getClusterName();
 HelixAdmin helixAdmin = 
minionContext.getHelixManager().getClusterManagmentTool();
-RoaringBitmap validDocIds = null;
 List servers = getServers(segmentName, tableNameWithType, 
helixAdmin, clusterName);
+List matchingBitmaps = new ArrayList<>();
+
 for (String server : servers) {
   InstanceConfig instanceConfig = 
helixAdmin.getInstanceConfig(clusterName, server);
   String endpoint = InstanceUtils.getServerAdminEndpoint(instanceConfig);
 
-  // We only need aggregated table size and the total number of docs/rows. 
Skipping column related stats, by
-  // passing an empty list.
   ServerSegmentMetadataReader serverSegmentMetadataReader = new 
ServerSegmentMetadataReader();
   ValidDocIdsBitmapResponse validDocIdsBitmapResponse;
   try {
 validDocIdsBitmapResponse =
 
serverSegmentMetadataReader.getValidDocIdsBitmapFromServer(tableNameWithType, 
segmentName, endpoint,
 validDocIdsType, 60_000);
   } catch (Exception e) {
-LOGGER.warn("Unable to retrieve validDocIds bitmap for segment: " + 
segmentName + " from endpoint: "
-+ endpoint, e);
+if (comparisonMode == ValidDocIdsComparisonMode.EQUAL_CONSENSUS) {
+  throw new IllegalStateException(
+  "Unable to retrieve validDocIds bitmap for segment: " + 
segmentName + " from endpoint: " + endpoint, e);
+}
+LOGGER.warn(
+"Unable to retrieve validDocIds bitmap for segment: " + 
segmentName + " from endpoint: " + endpoint, e);
 continue;
   }
 
+  String crcFromValidDocIdsBitmap = 
validDocIdsBitmapResponse.getSegmentCrc();
   // Check crc from the downloaded segment against the crc returned from 
the server along with the valid doc id
   // bitmap. If this doesn't match, this means that we are hitting the 
race condition where the segment has been
   // uploaded successfully while the server is still reloading the 
segment. Reloading can take a while when the
   // offheap upsert is used because we will need to delete & add all 
primary keys.
   // `BaseSingleSegmentConversionExecutor.executeTask()` already checks 
for the crc from the task generator
   // against the crc from the current segment zk metadata, so we don't 
need to check that here.
-  String crcFromValidDocIdsBitmap = 
validDocIdsBitmapResponse.getSegmentCrc();
   if (!expectedCrc.equals(crcFromValidDocIdsBitmap)) {
-// In this scenario, we are hitting the other replica of the segment 
which did not commit to ZK or deepstore.
-// We will skip processing this bitmap to query other server to 
confirm if there is a valid matching CRC.
-String message = "CRC mismatch for segment: " + segmentName + ", 
expected value based on task generator: "
-+ expectedCrc + ", actual crc from validDocIdsBitmapResponse from 
endpoint " + endpoint + ": "
-+ crcFromValidDocIdsBitmap;
-LOGGER.warn(message);
+if (comparisonMode == ValidDocIdsComparisonMode.EQUAL_CONSENSUS) {
+  throw new IllegalStateException(
+  "CRC mismatch for segment: " + segmentName + ", expected: " + 
expectedCrc + ", actual from endpoint "
+  + endpoint + ": " + crcFromValidDocIdsBitmap);
+}
+LOGGER.warn("CRC mismatch for segment: {} from endpoint {}, skipping", 
segmentName, endpoint);
 continue;
   }
 
-  // skipping servers which are not in READY state. The bitmaps would be 
inconsistent when
-  // server is NOT READY as UPDATING segments might be updating the ONLINE 
segments
   if (validDocIdsBitmapResponse.getServerStatus() != null && 
!validDocIdsBitmapResponse.getServerStatus()
   .equals(ServiceStatus.Sta

Re: [PR] Avoid Inconsistencies among replicas during Upsert Compaction Tasks [pinot]

2026-03-04 Thread via GitHub


Jackie-Jiang commented on PR #17696:
URL: https://github.com/apache/pinot/pull/17696#issuecomment-4000214228

   We can consider adding a flag to also run the check on the generator side


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Avoid Inconsistencies among replicas during Upsert Compaction Tasks [pinot]

2026-03-04 Thread via GitHub


Jackie-Jiang commented on code in PR #17696:
URL: https://github.com/apache/pinot/pull/17696#discussion_r2886014045


##
pinot-plugins/pinot-minion-tasks/pinot-minion-builtin-tasks/src/main/java/org/apache/pinot/plugin/minion/tasks/MinionTaskUtils.java:
##
@@ -281,66 +295,111 @@ public static boolean 
extractMinionAllowDownloadFromServer(TableConfig tableConf
   }
 
   /**
-   * Returns the validDocID bitmap from the server whose local segment crc 
matches both crc of ZK metadata and
-   * deepstore copy (expectedCrc).
+   * Returns the validDocIds bitmap from server(s). {@code comparisonMode} is 
the task config value: NONE,
+   * EQUAL_CONSENSUS(default), or MAX_VALID_DOCS.
*/
   @Nullable
   public static RoaringBitmap getValidDocIdFromServerMatchingCrc(String 
tableNameWithType, String segmentName,
-  String validDocIdsType, MinionContext minionContext, String expectedCrc) 
{
+  String validDocIdsType, MinionContext minionContext, String expectedCrc, 
String comparisonModeStr) {
+ValidDocIdsComparisonMode comparisonMode = 
parseValidDocIdsComparisonMode(comparisonModeStr);
 String clusterName = minionContext.getHelixManager().getClusterName();
 HelixAdmin helixAdmin = 
minionContext.getHelixManager().getClusterManagmentTool();
-RoaringBitmap validDocIds = null;
 List servers = getServers(segmentName, tableNameWithType, 
helixAdmin, clusterName);
+List matchingBitmaps = new ArrayList<>();
+
 for (String server : servers) {
   InstanceConfig instanceConfig = 
helixAdmin.getInstanceConfig(clusterName, server);
   String endpoint = InstanceUtils.getServerAdminEndpoint(instanceConfig);
 
-  // We only need aggregated table size and the total number of docs/rows. 
Skipping column related stats, by
-  // passing an empty list.
   ServerSegmentMetadataReader serverSegmentMetadataReader = new 
ServerSegmentMetadataReader();
   ValidDocIdsBitmapResponse validDocIdsBitmapResponse;
   try {
 validDocIdsBitmapResponse =
 
serverSegmentMetadataReader.getValidDocIdsBitmapFromServer(tableNameWithType, 
segmentName, endpoint,
 validDocIdsType, 60_000);
   } catch (Exception e) {
-LOGGER.warn("Unable to retrieve validDocIds bitmap for segment: " + 
segmentName + " from endpoint: "
-+ endpoint, e);
+if (comparisonMode == ValidDocIdsComparisonMode.EQUAL_CONSENSUS) {

Review Comment:
   How about the mode with most valid docs?



##
pinot-plugins/pinot-minion-tasks/pinot-minion-builtin-tasks/src/main/java/org/apache/pinot/plugin/minion/tasks/MinionTaskUtils.java:
##
@@ -281,66 +295,111 @@ public static boolean 
extractMinionAllowDownloadFromServer(TableConfig tableConf
   }
 
   /**
-   * Returns the validDocID bitmap from the server whose local segment crc 
matches both crc of ZK metadata and
-   * deepstore copy (expectedCrc).
+   * Returns the validDocIds bitmap from server(s). {@code comparisonMode} is 
the task config value: NONE,
+   * EQUAL_CONSENSUS(default), or MAX_VALID_DOCS.
*/
   @Nullable
   public static RoaringBitmap getValidDocIdFromServerMatchingCrc(String 
tableNameWithType, String segmentName,
-  String validDocIdsType, MinionContext minionContext, String expectedCrc) 
{
+  String validDocIdsType, MinionContext minionContext, String expectedCrc, 
String comparisonModeStr) {
+ValidDocIdsComparisonMode comparisonMode = 
parseValidDocIdsComparisonMode(comparisonModeStr);
 String clusterName = minionContext.getHelixManager().getClusterName();
 HelixAdmin helixAdmin = 
minionContext.getHelixManager().getClusterManagmentTool();
-RoaringBitmap validDocIds = null;
 List servers = getServers(segmentName, tableNameWithType, 
helixAdmin, clusterName);
+List matchingBitmaps = new ArrayList<>();
+
 for (String server : servers) {
   InstanceConfig instanceConfig = 
helixAdmin.getInstanceConfig(clusterName, server);
   String endpoint = InstanceUtils.getServerAdminEndpoint(instanceConfig);
 
-  // We only need aggregated table size and the total number of docs/rows. 
Skipping column related stats, by
-  // passing an empty list.
   ServerSegmentMetadataReader serverSegmentMetadataReader = new 
ServerSegmentMetadataReader();
   ValidDocIdsBitmapResponse validDocIdsBitmapResponse;
   try {
 validDocIdsBitmapResponse =
 
serverSegmentMetadataReader.getValidDocIdsBitmapFromServer(tableNameWithType, 
segmentName, endpoint,
 validDocIdsType, 60_000);
   } catch (Exception e) {
-LOGGER.warn("Unable to retrieve validDocIds bitmap for segment: " + 
segmentName + " from endpoint: "
-+ endpoint, e);
+if (comparisonMode == ValidDocIdsComparisonMode.EQUAL_CONSENSUS) {
+  throw new IllegalStateException(
+  "Unable to retrieve validDocIds bitmap for segment: " + 
segmentName + " from endpoint: " + endpoint, e);

Re: [PR] Avoid Inconsistencies among replicas during Upsert Compaction Tasks [pinot]

2026-03-02 Thread via GitHub


deepthi912 commented on PR #17696:
URL: https://github.com/apache/pinot/pull/17696#issuecomment-3985464078

   > Is it correct to perform the check on generator side, or should we perform 
the check on the executor side? We don't want to have the same check in both 
places. I think skipping the segment on the executor side is the correct 
approach. When a segment is completely empty, who is responsible for deleting 
it? Generator or executor?
   > 
   > cc @tarun11Mavani It would be good if you also join this discussion
   
   Performing it only on executor makes much more sense, when the segment is 
empty executor is responsible for deleting it. @Jackie-Jiang 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Avoid Inconsistencies among replicas during Upsert Compaction Tasks [pinot]

2026-02-13 Thread via GitHub


Jackie-Jiang commented on PR #17696:
URL: https://github.com/apache/pinot/pull/17696#issuecomment-3899528426

   Is it correct to perform the check on generator side, or should we perform 
the check on the executor side? We don't want to have the same check in both 
places. I think skipping the segment on the executor side is the correct 
approach.
   When a segment is completely empty, who is responsible for deleting it? 
Generator or executor?
   
   cc @tarun11Mavani It would be good if you also join this discussion


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Avoid Inconsistencies among replicas during Upsert Compaction Tasks [pinot]

2026-02-12 Thread via GitHub


deepthi912 commented on PR #17696:
URL: https://github.com/apache/pinot/pull/17696#issuecomment-3894844314

   https://github.com/apache/pinot/pull/17352
   Thanks for pointing me out here, it does look for consensus but also not 
solving it in Compaction. Only used in CompactMergeTask 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Avoid Inconsistencies among replicas during Upsert Compaction Tasks [pinot]

2026-02-12 Thread via GitHub


deepthi912 commented on code in PR #17696:
URL: https://github.com/apache/pinot/pull/17696#discussion_r2802258809


##
pinot-plugins/pinot-minion-tasks/pinot-minion-builtin-tasks/src/main/java/org/apache/pinot/plugin/minion/tasks/MinionTaskUtils.java:
##
@@ -281,47 +293,67 @@ public static RoaringBitmap 
getValidDocIdFromServerMatchingCrc(String tableNameW
   // We only need aggregated table size and the total number of docs/rows. 
Skipping column related stats, by
   // passing an empty list.
   ServerSegmentMetadataReader serverSegmentMetadataReader = new 
ServerSegmentMetadataReader();
-  ValidDocIdsBitmapResponse validDocIdsBitmapResponse;
+  ValidDocIdsBitmapResponse validDocIdsBitmapResponse = null;
   try {
 validDocIdsBitmapResponse =
 
serverSegmentMetadataReader.getValidDocIdsBitmapFromServer(tableNameWithType, 
segmentName, endpoint,
 validDocIdsType, 60_000);
   } catch (Exception e) {
-LOGGER.warn("Unable to retrieve validDocIds bitmap for segment: " + 
segmentName + " from endpoint: "
-+ endpoint, e);
-continue;
+// We need validDocIds from all servers; do not continue if any server 
fails to return the bitmap.
+if (TableConfigUtils.checkForInconsistentStateConfigs(tableConfig)) {
+  throw new IllegalStateException(
+  "Unable to retrieve validDocIds bitmap for segment: " + 
segmentName + " from endpoint: " + endpoint
+  + ". ValidDocIds bitmap is required from all servers holding 
the segment.", e);
+} else {
+  LOGGER.warn(
+  "Unable to retrieve validDocIds bitmap for segment: " + 
segmentName + " from endpoint: " + endpoint, e);
+  continue;
+}
   }
 
   // Check crc from the downloaded segment against the crc returned from 
the server along with the valid doc id
-  // bitmap. If this doesn't match, this means that we are hitting the 
race condition where the segment has been
-  // uploaded successfully while the server is still reloading the 
segment. Reloading can take a while when the
-  // offheap upsert is used because we will need to delete & add all 
primary keys.
-  // `BaseSingleSegmentConversionExecutor.executeTask()` already checks 
for the crc from the task generator
-  // against the crc from the current segment zk metadata, so we don't 
need to check that here.
+  // bitmap. If this doesn't match, we are hitting a replica that has not 
committed to ZK/deepstore yet (e.g.
+  // still reloading). We require all servers to have matching CRC to 
avoid partial upsert inconsistencies.
   String crcFromValidDocIdsBitmap = 
validDocIdsBitmapResponse.getSegmentCrc();
   if (!expectedCrc.equals(crcFromValidDocIdsBitmap)) {
-// In this scenario, we are hitting the other replica of the segment 
which did not commit to ZK or deepstore.
-// We will skip processing this bitmap to query other server to 
confirm if there is a valid matching CRC.
-String message = "CRC mismatch for segment: " + segmentName + ", 
expected value based on task generator: "
-+ expectedCrc + ", actual crc from validDocIdsBitmapResponse from 
endpoint " + endpoint + ": "
-+ crcFromValidDocIdsBitmap;
-LOGGER.warn(message);
-continue;
+if (TableConfigUtils.checkForInconsistentStateConfigs(tableConfig)) {
+  throw new IllegalStateException("CRC mismatch for segment: " + 
segmentName + " from endpoint: " + endpoint
+  + ". Expected CRC (from task generator): " + expectedCrc + ", 
actual from server: "
+  + crcFromValidDocIdsBitmap
+  + ". ValidDocIds bitmap is required from all servers with 
matching CRC to avoid replica inconsistency.");
+} else {
+  continue;
+}
   }
 
-  // skipping servers which are not in READY state. The bitmaps would be 
inconsistent when
-  // server is NOT READY as UPDATING segments might be updating the ONLINE 
segments
+  // Require all servers to be READY. Bitmaps are inconsistent when server 
is NOT READY (e.g. UPDATING segments).
   if (validDocIdsBitmapResponse.getServerStatus() != null && 
!validDocIdsBitmapResponse.getServerStatus()
   .equals(ServiceStatus.Status.GOOD)) {
-String message = "Server " + validDocIdsBitmapResponse.getInstanceId() 
+ " is in "
-+ validDocIdsBitmapResponse.getServerStatus() + " state, skipping 
it for execution for segment: "
-+ validDocIdsBitmapResponse.getSegmentName() + ". Will try other 
servers.";
-LOGGER.warn(message);
-continue;
+if (TableConfigUtils.checkForInconsistentStateConfigs(tableConfig)) {
+  throw new IllegalStateException("Server " + 
validDocIdsBitmapResponse.getInstanceId() + " is in "
+  + validDocIdsBitmapResponse.getServerStatus() + " state for 
segm

Re: [PR] Avoid Inconsistencies among replicas during Upsert Compaction Tasks [pinot]

2026-02-12 Thread via GitHub


Jackie-Jiang commented on code in PR #17696:
URL: https://github.com/apache/pinot/pull/17696#discussion_r2802207592


##
pinot-plugins/pinot-minion-tasks/pinot-minion-builtin-tasks/src/main/java/org/apache/pinot/plugin/minion/tasks/MinionTaskUtils.java:
##
@@ -281,47 +293,67 @@ public static RoaringBitmap 
getValidDocIdFromServerMatchingCrc(String tableNameW
   // We only need aggregated table size and the total number of docs/rows. 
Skipping column related stats, by
   // passing an empty list.
   ServerSegmentMetadataReader serverSegmentMetadataReader = new 
ServerSegmentMetadataReader();
-  ValidDocIdsBitmapResponse validDocIdsBitmapResponse;
+  ValidDocIdsBitmapResponse validDocIdsBitmapResponse = null;
   try {
 validDocIdsBitmapResponse =
 
serverSegmentMetadataReader.getValidDocIdsBitmapFromServer(tableNameWithType, 
segmentName, endpoint,
 validDocIdsType, 60_000);
   } catch (Exception e) {
-LOGGER.warn("Unable to retrieve validDocIds bitmap for segment: " + 
segmentName + " from endpoint: "
-+ endpoint, e);
-continue;
+// We need validDocIds from all servers; do not continue if any server 
fails to return the bitmap.
+if (TableConfigUtils.checkForInconsistentStateConfigs(tableConfig)) {

Review Comment:
   Perform this check on the caller side, and pass in as a boolean flag (e.g. 
`strict`)



##
pinot-plugins/pinot-minion-tasks/pinot-minion-builtin-tasks/src/main/java/org/apache/pinot/plugin/minion/tasks/MinionTaskUtils.java:
##
@@ -281,47 +293,67 @@ public static RoaringBitmap 
getValidDocIdFromServerMatchingCrc(String tableNameW
   // We only need aggregated table size and the total number of docs/rows. 
Skipping column related stats, by
   // passing an empty list.
   ServerSegmentMetadataReader serverSegmentMetadataReader = new 
ServerSegmentMetadataReader();
-  ValidDocIdsBitmapResponse validDocIdsBitmapResponse;
+  ValidDocIdsBitmapResponse validDocIdsBitmapResponse = null;
   try {
 validDocIdsBitmapResponse =
 
serverSegmentMetadataReader.getValidDocIdsBitmapFromServer(tableNameWithType, 
segmentName, endpoint,
 validDocIdsType, 60_000);
   } catch (Exception e) {
-LOGGER.warn("Unable to retrieve validDocIds bitmap for segment: " + 
segmentName + " from endpoint: "
-+ endpoint, e);
-continue;
+// We need validDocIds from all servers; do not continue if any server 
fails to return the bitmap.
+if (TableConfigUtils.checkForInconsistentStateConfigs(tableConfig)) {
+  throw new IllegalStateException(
+  "Unable to retrieve validDocIds bitmap for segment: " + 
segmentName + " from endpoint: " + endpoint
+  + ". ValidDocIds bitmap is required from all servers holding 
the segment.", e);
+} else {
+  LOGGER.warn(
+  "Unable to retrieve validDocIds bitmap for segment: " + 
segmentName + " from endpoint: " + endpoint, e);
+  continue;
+}
   }
 
   // Check crc from the downloaded segment against the crc returned from 
the server along with the valid doc id
-  // bitmap. If this doesn't match, this means that we are hitting the 
race condition where the segment has been
-  // uploaded successfully while the server is still reloading the 
segment. Reloading can take a while when the
-  // offheap upsert is used because we will need to delete & add all 
primary keys.
-  // `BaseSingleSegmentConversionExecutor.executeTask()` already checks 
for the crc from the task generator
-  // against the crc from the current segment zk metadata, so we don't 
need to check that here.
+  // bitmap. If this doesn't match, we are hitting a replica that has not 
committed to ZK/deepstore yet (e.g.
+  // still reloading). We require all servers to have matching CRC to 
avoid partial upsert inconsistencies.
   String crcFromValidDocIdsBitmap = 
validDocIdsBitmapResponse.getSegmentCrc();
   if (!expectedCrc.equals(crcFromValidDocIdsBitmap)) {
-// In this scenario, we are hitting the other replica of the segment 
which did not commit to ZK or deepstore.
-// We will skip processing this bitmap to query other server to 
confirm if there is a valid matching CRC.
-String message = "CRC mismatch for segment: " + segmentName + ", 
expected value based on task generator: "
-+ expectedCrc + ", actual crc from validDocIdsBitmapResponse from 
endpoint " + endpoint + ": "
-+ crcFromValidDocIdsBitmap;
-LOGGER.warn(message);
-continue;
+if (TableConfigUtils.checkForInconsistentStateConfigs(tableConfig)) {
+  throw new IllegalStateException("CRC mismatch for segment: " + 
segmentName + " from endpoint: " + endpoint
+  + ". Expected CRC (from task generator): " + expectedCrc + 

Re: [PR] Avoid Inconsistencies among replicas during Upsert Compaction Tasks [pinot]

2026-02-12 Thread via GitHub


codecov-commenter commented on PR #17696:
URL: https://github.com/apache/pinot/pull/17696#issuecomment-3894652235

   ### :x: 2 Tests Failed:
   | Tests completed | Failed | Passed | Skipped |
   |---|---|---|---|
   | 6638 | 2 | 6636 | 5 |
   View the full list of 2 :snowflake: flaky test(s)
   
   > org.apache.pinot.plugin.inputformat.json.confluent.JsonConfluentSchemaTest::@BeforeClass
 setup
   > **Flake rate in main:** 100.00% (Passed 0 times, Failed 35 times)
   > Stack Traces | 0.373s run time
   > 
   > > Could not find a valid Docker 
environment. Please see logs and check configuration
   > 
   > 
   
   
   > org.apache.pinot.plugin.inputformat.json.confluent.JsonConfluentSchemaTest::setup
   > **Flake rate in main:** 100.00% (Passed 0 times, Failed 70 times)
   > Stack Traces | 0.373s run time
   > 
   > > Could not find a valid Docker 
environment. Please see logs and check configuration
   > 
   > 
   
   
   
   To view more test analytics, go to the [Test Analytics 
Dashboard](https://app.codecov.io/gh/apache/pinot/tests/deepthi912%3AvalidDocsUpserts)
   📋 Got 3 mins? [Take this short 
survey](https://forms.gle/22i53Qa1CySZjA6c7) to help us improve Test 
Analytics.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Avoid Inconsistencies among replicas during Upsert Compaction Tasks [pinot]

2026-02-12 Thread via GitHub


codecov-commenter commented on PR #17695:
URL: https://github.com/apache/pinot/pull/17695#issuecomment-3894603747

   ## 
[Codecov](https://app.codecov.io/gh/apache/pinot/pull/17695?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache)
 Report
   :white_check_mark: All modified and coverable lines are covered by tests.
   :white_check_mark: Project coverage is 55.59%. Comparing base 
([`6f0a22d`](https://app.codecov.io/gh/apache/pinot/commit/6f0a22d877285a3ae61abfcf6adf2fb36f89a27c?dropdown=coverage&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache))
 to head 
([`b1f8e91`](https://app.codecov.io/gh/apache/pinot/commit/b1f8e9164968493bca6dcb44b29fa03f52d48256?dropdown=coverage&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache)).
   :warning: Report is 1 commits behind head on master.
   
   Additional details and impacted files
   
   
   
   ```diff
   @@ Coverage Diff  @@
   ## master   #17695  +/-   ##
   
   - Coverage 55.63%   55.59%   -0.04% 
   + Complexity  721  720   -1 
   
 Files  2479 2476   -3 
 Lines140436   140428   -8 
 Branches  2237522375  
   
   - Hits  7813478074  -60 
   - Misses5570955756  +47 
   - Partials   6593 6598   +5 
   ```
   
   | 
[Flag](https://app.codecov.io/gh/apache/pinot/pull/17695/flags?src=pr&el=flags&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache)
 | Coverage Δ | |
   |---|---|---|
   | 
[integration](https://app.codecov.io/gh/apache/pinot/pull/17695/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache)
 | `?` | |
   | 
[integration2](https://app.codecov.io/gh/apache/pinot/pull/17695/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache)
 | `?` | |
   | 
[java-11](https://app.codecov.io/gh/apache/pinot/pull/17695/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache)
 | `?` | |
   | 
[java-21](https://app.codecov.io/gh/apache/pinot/pull/17695/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache)
 | `55.59% <ø> (+<0.01%)` | :arrow_up: |
   | 
[temurin](https://app.codecov.io/gh/apache/pinot/pull/17695/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache)
 | `55.59% <ø> (-0.04%)` | :arrow_down: |
   | 
[unittests](https://app.codecov.io/gh/apache/pinot/pull/17695/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache)
 | `55.59% <ø> (-0.05%)` | :arrow_down: |
   | 
[unittests1](https://app.codecov.io/gh/apache/pinot/pull/17695/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache)
 | `55.59% <ø> (-0.05%)` | :arrow_down: |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   
   [:umbrella: View full report in Codecov by 
Sentry](https://app.codecov.io/gh/apache/pinot/pull/17695?dropdown=coverage&src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache).
   
   :loudspeaker: Have feedback on the report? [Share it 
here](https://about.codecov.io/codecov-pr-comment-feedback/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache).
:rocket: New features to boost your workflow: 
   
   - :package: [JS Bundle 
Analysis](https://docs.codecov.com/docs/javascript-bundle-analysis): Save 
yourself from yourself by tracking and limiting bundle sizes in JS merges.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Avoid Inconsistencies among replicas during Upsert Compaction Tasks [pinot]

2026-02-12 Thread via GitHub


deepthi912 closed pull request #17695: Avoid Inconsistencies among replicas 
during Upsert Compaction Tasks
URL: https://github.com/apache/pinot/pull/17695


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]