Re: [PR] Cleanup serialiazation of TaskReportMap (druid)

2024-04-01 Thread via GitHub


gianm merged PR #16217:
URL: https://github.com/apache/druid/pull/16217


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



Re: [PR] Cleanup serialiazation of TaskReportMap (druid)

2024-03-31 Thread via GitHub


abhishekrb19 commented on code in PR #16217:
URL: https://github.com/apache/druid/pull/16217#discussion_r1545948002


##
indexing-service/src/test/java/org/apache/druid/indexing/common/task/batch/parallel/AbstractParallelIndexSupervisorTaskTest.java:
##
@@ -546,12 +548,17 @@ public ListenableFuture runTask(String taskId, 
Object taskObject)
 
 @Override
 public ListenableFuture> taskReportAsMap(String taskId)
+{
+  return Futures.immediateFuture(null);

Review Comment:
   Ok, makes sense



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



Re: [PR] Cleanup serialiazation of TaskReportMap (druid)

2024-03-31 Thread via GitHub


kfaraz commented on PR #16217:
URL: https://github.com/apache/druid/pull/16217#issuecomment-2029100681

   Thanks a lot for the review, @abhishekrb19 !
   Do you think I can include your suggestions in my follow up PR as they are 
all related to tests?
   I don't want to invoke another CI cycle if I can help it  .


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



Re: [PR] Cleanup serialiazation of TaskReportMap (druid)

2024-03-31 Thread via GitHub


kfaraz commented on code in PR #16217:
URL: https://github.com/apache/druid/pull/16217#discussion_r1545933086


##
indexing-service/src/test/java/org/apache/druid/indexing/common/task/batch/parallel/AbstractParallelIndexSupervisorTaskTest.java:
##
@@ -546,12 +548,17 @@ public ListenableFuture runTask(String taskId, 
Object taskObject)
 
 @Override
 public ListenableFuture> taskReportAsMap(String taskId)
+{
+  return Futures.immediateFuture(null);

Review Comment:
   It was doing that originally but not needed right now as I have added the 
other method `getLiveReportsForTask()` just below this one.
   
   This is anyway used only in the tests and I plan to fix it back up once I 
replace the `Map` in the `OverlordClient` with 
`TaskReport.ReportMap`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



Re: [PR] Cleanup serialiazation of TaskReportMap (druid)

2024-03-31 Thread via GitHub


kfaraz commented on code in PR #16217:
URL: https://github.com/apache/druid/pull/16217#discussion_r1545932328


##
indexing-service/src/main/java/org/apache/druid/indexing/common/TaskReport.java:
##
@@ -48,13 +51,29 @@ public interface TaskReport
   /**
* Returns an order-preserving map that is suitable for passing into {@link 
TaskReportFileWriter#write}.
*/
-  static Map buildTaskReports(TaskReport... taskReports)
+  static ReportMap buildTaskReports(TaskReport... taskReports)
   {
-// Use LinkedHashMap to preserve order of the reports.
-Map taskReportMap = new LinkedHashMap<>();
+ReportMap taskReportMap = new ReportMap();
 for (TaskReport taskReport : taskReports) {
   taskReportMap.put(taskReport.getReportKey(), taskReport);
 }
 return taskReportMap;
   }
+
+  /**
+   * Represents an ordered map from report key to a TaskReport that is 
compatible
+   * for writing out reports to files or serving over HTTP.
+   * 
+   * This class is needed for Jackson serde to work correctly. Without this 
class,
+   * a TaskReport is serialized without the type information and cannot be
+   * deserialized back into a concrete implementation.
+   */
+  class ReportMap extends LinkedHashMap

Review Comment:
   No, I can add a test to verify the order. Although, I don't see any actual 
task writing a report map that contains multiple entries.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



Re: [PR] Cleanup serialiazation of TaskReportMap (druid)

2024-03-31 Thread via GitHub


kfaraz commented on code in PR #16217:
URL: https://github.com/apache/druid/pull/16217#discussion_r1545932328


##
indexing-service/src/main/java/org/apache/druid/indexing/common/TaskReport.java:
##
@@ -48,13 +51,29 @@ public interface TaskReport
   /**
* Returns an order-preserving map that is suitable for passing into {@link 
TaskReportFileWriter#write}.
*/
-  static Map buildTaskReports(TaskReport... taskReports)
+  static ReportMap buildTaskReports(TaskReport... taskReports)
   {
-// Use LinkedHashMap to preserve order of the reports.
-Map taskReportMap = new LinkedHashMap<>();
+ReportMap taskReportMap = new ReportMap();
 for (TaskReport taskReport : taskReports) {
   taskReportMap.put(taskReport.getReportKey(), taskReport);
 }
 return taskReportMap;
   }
+
+  /**
+   * Represents an ordered map from report key to a TaskReport that is 
compatible
+   * for writing out reports to files or serving over HTTP.
+   * 
+   * This class is needed for Jackson serde to work correctly. Without this 
class,
+   * a TaskReport is serialized without the type information and cannot be
+   * deserialized back into a concrete implementation.
+   */
+  class ReportMap extends LinkedHashMap

Review Comment:
   No, I can add a test to verify the order. Although, I don't see any actual 
task writing a report map that contains multiple entries. Also not sure why the 
order was considered to be important in the first place, its json anyway.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



Re: [PR] Cleanup serialiazation of TaskReportMap (druid)

2024-03-31 Thread via GitHub


kfaraz commented on code in PR #16217:
URL: https://github.com/apache/druid/pull/16217#discussion_r1545931052


##
indexing-service/src/test/java/org/apache/druid/indexing/common/task/TaskReportSerdeTest.java:
##
@@ -55,47 +56,56 @@ public TaskReportSerdeTest()
   }
 
   @Test
-  public void testSerde() throws Exception
+  public void testSerdeOfIngestionReport() throws Exception
   {
-IngestionStatsAndErrorsTaskReport report1 = new 
IngestionStatsAndErrorsTaskReport(
-"testID",
-new IngestionStatsAndErrors(
-IngestionState.BUILD_SEGMENTS,
-ImmutableMap.of(
-"hello", "world"
-),
-ImmutableMap.of(
-"number", 1234
-),
-"an error message",
-true,
-1000L,
-ImmutableMap.of("PartitionA", 5000L),
-5L,
-10L
-)
-);
-String report1serialized = jsonMapper.writeValueAsString(report1);
-IngestionStatsAndErrorsTaskReport report2 = 
(IngestionStatsAndErrorsTaskReport) jsonMapper.readValue(
-report1serialized,
-TaskReport.class
-);
-Assert.assertEquals(report1, report2);
-Assert.assertEquals(report1.hashCode(), report2.hashCode());
+IngestionStatsAndErrorsTaskReport originalReport = 
buildTestIngestionReport();
+String reportJson = jsonMapper.writeValueAsString(originalReport);
+TaskReport deserialized = jsonMapper.readValue(reportJson, 
TaskReport.class);
+
+Assert.assertTrue(deserialized instanceof 
IngestionStatsAndErrorsTaskReport);
+
+IngestionStatsAndErrorsTaskReport deserializedReport = 
(IngestionStatsAndErrorsTaskReport) deserialized;
+Assert.assertEquals(originalReport, deserializedReport);
+  }
+
+  @Test
+  public void testSerdeOfKillTaskReport() throws Exception
+  {
+KillTaskReport originalReport = new KillTaskReport("taskId", new 
KillTaskReport.Stats(1, 2, 3));
+String reportJson = jsonMapper.writeValueAsString(originalReport);
+TaskReport deserialized = jsonMapper.readValue(reportJson, 
TaskReport.class);
+
+Assert.assertTrue(deserialized instanceof KillTaskReport);
 
+KillTaskReport deserializedReport = (KillTaskReport) deserialized;
+Assert.assertEquals(originalReport, deserializedReport);
+  }
+
+  @Test
+  public void testWriteReportMapToFileAndRead() throws Exception
+  {
+IngestionStatsAndErrorsTaskReport report1 = buildTestIngestionReport();
 final File reportFile = temporaryFolder.newFile();
 final SingleFileTaskReportFileWriter writer = new 
SingleFileTaskReportFileWriter(reportFile);
 writer.setObjectMapper(jsonMapper);
-Map reportMap1 = TaskReport.buildTaskReports(report1);
+TaskReport.ReportMap reportMap1 = TaskReport.buildTaskReports(report1);
 writer.write("testID", reportMap1);
 
-Map reportMap2 = jsonMapper.readValue(
-reportFile,
-new TypeReference>() {}
-);
+TaskReport.ReportMap reportMap2 = jsonMapper.readValue(reportFile, 
TaskReport.ReportMap.class);
 Assert.assertEquals(reportMap1, reportMap2);
   }
 
+  @Test
+  public void testWriteReportMapToStringAndRead() throws Exception

Review Comment:
   Sure, that makes sense.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



Re: [PR] Cleanup serialiazation of TaskReportMap (druid)

2024-03-31 Thread via GitHub


kfaraz commented on code in PR #16217:
URL: https://github.com/apache/druid/pull/16217#discussion_r1545930630


##
indexing-service/src/test/java/org/apache/druid/indexing/common/task/batch/parallel/AbstractParallelIndexSupervisorTaskTest.java:
##
@@ -546,12 +548,17 @@ public ListenableFuture runTask(String taskId, 
Object taskObject)
 
 @Override
 public ListenableFuture> taskReportAsMap(String taskId)

Review Comment:
   Yes, I have that change in a follow up PR. Didn't do it here as it requires 
moving all the `TaskReport` related classes to the `druid-processing` module, 
so that `OverlordClient` can use it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



Re: [PR] Cleanup serialiazation of TaskReportMap (druid)

2024-03-31 Thread via GitHub


abhishekrb19 commented on code in PR #16217:
URL: https://github.com/apache/druid/pull/16217#discussion_r1545923498


##
indexing-service/src/test/java/org/apache/druid/indexing/common/task/batch/parallel/AbstractParallelIndexSupervisorTaskTest.java:
##
@@ -546,12 +548,17 @@ public ListenableFuture runTask(String taskId, 
Object taskObject)
 
 @Override
 public ListenableFuture> taskReportAsMap(String taskId)

Review Comment:
   Can `taskReportAsMap()` now return the concrete type `TaskReport.ReportMap` 
instead of `Map`?



##
indexing-service/src/main/java/org/apache/druid/indexing/common/TaskReport.java:
##
@@ -48,13 +51,29 @@ public interface TaskReport
   /**
* Returns an order-preserving map that is suitable for passing into {@link 
TaskReportFileWriter#write}.
*/
-  static Map buildTaskReports(TaskReport... taskReports)
+  static ReportMap buildTaskReports(TaskReport... taskReports)
   {
-// Use LinkedHashMap to preserve order of the reports.
-Map taskReportMap = new LinkedHashMap<>();
+ReportMap taskReportMap = new ReportMap();
 for (TaskReport taskReport : taskReports) {
   taskReportMap.put(taskReport.getReportKey(), taskReport);
 }
 return taskReportMap;
   }
+
+  /**
+   * Represents an ordered map from report key to a TaskReport that is 
compatible
+   * for writing out reports to files or serving over HTTP.
+   * 
+   * This class is needed for Jackson serde to work correctly. Without this 
class,
+   * a TaskReport is serialized without the type information and cannot be
+   * deserialized back into a concrete implementation.
+   */
+  class ReportMap extends LinkedHashMap

Review Comment:
   Are there any tests that verify the reports are indeed ordered since we rely 
on a `LinkedHashMap`? Just looking at the callers of `buildTaskReports()`, I 
don't seem to find any.



##
indexing-service/src/test/java/org/apache/druid/indexing/common/task/batch/parallel/AbstractParallelIndexSupervisorTaskTest.java:
##
@@ -546,12 +548,17 @@ public ListenableFuture runTask(String taskId, 
Object taskObject)
 
 @Override
 public ListenableFuture> taskReportAsMap(String taskId)
+{
+  return Futures.immediateFuture(null);

Review Comment:
   Should this call `getLiveReportsForTask(taskId)`?



##
indexing-service/src/test/java/org/apache/druid/indexing/common/task/TaskReportSerdeTest.java:
##
@@ -55,47 +56,56 @@ public TaskReportSerdeTest()
   }
 
   @Test
-  public void testSerde() throws Exception
+  public void testSerdeOfIngestionReport() throws Exception
   {
-IngestionStatsAndErrorsTaskReport report1 = new 
IngestionStatsAndErrorsTaskReport(
-"testID",
-new IngestionStatsAndErrors(
-IngestionState.BUILD_SEGMENTS,
-ImmutableMap.of(
-"hello", "world"
-),
-ImmutableMap.of(
-"number", 1234
-),
-"an error message",
-true,
-1000L,
-ImmutableMap.of("PartitionA", 5000L),
-5L,
-10L
-)
-);
-String report1serialized = jsonMapper.writeValueAsString(report1);
-IngestionStatsAndErrorsTaskReport report2 = 
(IngestionStatsAndErrorsTaskReport) jsonMapper.readValue(
-report1serialized,
-TaskReport.class
-);
-Assert.assertEquals(report1, report2);
-Assert.assertEquals(report1.hashCode(), report2.hashCode());
+IngestionStatsAndErrorsTaskReport originalReport = 
buildTestIngestionReport();
+String reportJson = jsonMapper.writeValueAsString(originalReport);
+TaskReport deserialized = jsonMapper.readValue(reportJson, 
TaskReport.class);
+
+Assert.assertTrue(deserialized instanceof 
IngestionStatsAndErrorsTaskReport);
+
+IngestionStatsAndErrorsTaskReport deserializedReport = 
(IngestionStatsAndErrorsTaskReport) deserialized;
+Assert.assertEquals(originalReport, deserializedReport);
+  }
+
+  @Test
+  public void testSerdeOfKillTaskReport() throws Exception
+  {
+KillTaskReport originalReport = new KillTaskReport("taskId", new 
KillTaskReport.Stats(1, 2, 3));
+String reportJson = jsonMapper.writeValueAsString(originalReport);
+TaskReport deserialized = jsonMapper.readValue(reportJson, 
TaskReport.class);
+
+Assert.assertTrue(deserialized instanceof KillTaskReport);
 
+KillTaskReport deserializedReport = (KillTaskReport) deserialized;
+Assert.assertEquals(originalReport, deserializedReport);
+  }
+
+  @Test
+  public void testWriteReportMapToFileAndRead() throws Exception
+  {
+IngestionStatsAndErrorsTaskReport report1 = buildTestIngestionReport();
 final File reportFile = temporaryFolder.newFile();
 final SingleFileTaskReportFileWriter writer = new 
SingleFileTaskReportFileWriter(reportFile);
 writer.setObjectMapper(jsonMapper);
-Map reportMap1 = TaskReport.buildTaskReports(report1);
+TaskReport.ReportMap 

[PR] Cleanup serialiazation of TaskReportMap (druid)

2024-03-29 Thread via GitHub


kfaraz opened a new pull request, #16217:
URL: https://github.com/apache/druid/pull/16217

   The serialization of a `Map` was originally fixed in 
https://github.com/apache/druid/pull/12938.
   Essentially, while serializing a `Map` or even a `List` containing 
`TaskReport` objects, the `type` information is lost. Thus, the object cannot 
be serialized back.
   
   This is a known issue with Jackson.
   
   The way this has been tackled in the Druid code till now is:
   - Use special logic in `SingleFileTaskReportFileWriter.writeReportToStream()`
   - For live reports, build a raw map instead of a concrete `TaskReport` object
   
   This PR attempts to simplify that solution by adding a new 
`TaskReport.ReportMap` class.
   
   ### Changes
   - Add class `TaskReport.ReportMap`
   - Have `TaskReport.buildTaskReports()` build the new class
   - Add serde tests for writing to string, file for the known task report types


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org