[GitHub] [kafka] kowshik commented on a change in pull request #9110: MINOR: Ensure a reason is logged for every segment deletion

2020-08-03 Thread GitBox


kowshik commented on a change in pull request #9110:
URL: https://github.com/apache/kafka/pull/9110#discussion_r464012842



##
File path: core/src/main/scala/kafka/log/Log.scala
##
@@ -2227,14 +2210,17 @@ class Log(@volatile private var _dir: File,
* @param segments The log segments to schedule for deletion
* @param asyncDelete Whether the segment files should be deleted 
asynchronously
*/
-  private def removeAndDeleteSegments(segments: Iterable[LogSegment], 
asyncDelete: Boolean): Unit = {
+  private def removeAndDeleteSegments(segments: Iterable[LogSegment],
+  asyncDelete: Boolean,
+  reason: SegmentDeletionReason): Unit = {
 if (segments.nonEmpty) {
   lock synchronized {
 // As most callers hold an iterator into the `segments` collection and 
`removeAndDeleteSegment` mutates it by
 // removing the deleted segment, we should force materialization of 
the iterator here, so that results of the
 // iteration remain valid and deterministic.
 val toDelete = segments.toList
 toDelete.foreach { segment =>
+  info(s"${reason.reasonString(this, segment)}")

Review comment:
   @dhruvilshah3 Sounds good!
   





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [kafka] kowshik commented on a change in pull request #9110: MINOR: Ensure a reason is logged for every segment deletion

2020-08-02 Thread GitBox


kowshik commented on a change in pull request #9110:
URL: https://github.com/apache/kafka/pull/9110#discussion_r464213840



##
File path: core/src/main/scala/kafka/log/Log.scala
##
@@ -2227,13 +2210,16 @@ class Log(@volatile private var _dir: File,
* @param segments The log segments to schedule for deletion
* @param asyncDelete Whether the segment files should be deleted 
asynchronously
*/
-  private def removeAndDeleteSegments(segments: Iterable[LogSegment], 
asyncDelete: Boolean): Unit = {
+  private def removeAndDeleteSegments(segments: Iterable[LogSegment],
+  asyncDelete: Boolean,
+  reason: SegmentDeletionReason): Unit = {
 if (segments.nonEmpty) {
   lock synchronized {
 // As most callers hold an iterator into the `segments` collection and 
`removeAndDeleteSegment` mutates it by
 // removing the deleted segment, we should force materialization of 
the iterator here, so that results of the
 // iteration remain valid and deterministic.
 val toDelete = segments.toList
+println(s"${reason.reasonString(this, toDelete)}")

Review comment:
   Did you mean to use `info` level logging?
   

##
File path: core/src/main/scala/kafka/log/Log.scala
##
@@ -2686,3 +2670,50 @@ object LogMetricNames {
 List(NumLogSegments, LogStartOffset, LogEndOffset, Size)
   }
 }
+
+sealed trait SegmentDeletionReason {
+  def reasonString(log: Log, toDelete: Iterable[LogSegment]): String
+}
+
+case object RetentionMsBreachDeletion extends SegmentDeletionReason {
+  override def reasonString(log: Log, toDelete: Iterable[LogSegment]): String 
= {
+s"Deleting segments due to retention time ${log.config.retentionMs}ms 
breach: ${toDelete.mkString(",")}"
+  }
+}
+
+case object RetentionSizeBreachDeletion extends SegmentDeletionReason {
+  override def reasonString(log: Log, toDelete: Iterable[LogSegment]): String 
= {
+s"Deleting segments due to retention size ${log.config.retentionSize} 
breach. " +
+  s"Current log size is ${log.size}. ${toDelete.mkString(",")}"

Review comment:
   nit: replace "." with ":" 
   
   `s"Current log size is ${log.size}: ${toDelete.mkString(",")}"`

##
File path: core/src/main/scala/kafka/log/LogSegment.scala
##
@@ -413,7 +413,7 @@ class LogSegment private[log] (val log: FileRecords,
   override def toString: String = "LogSegment(baseOffset=" + baseOffset +
 ", size=" + size +
 ", lastModifiedTime=" + lastModified +
-", largestTime=" + largestTimestamp +
+", largestRecordTimestamp=" + largestRecordTimestamp +

Review comment:
   Should we keep `largestTime` from LHS, and, print both 
`largestRecordTimestamp` and `largestTime` ?

##
File path: core/src/main/scala/kafka/log/Log.scala
##
@@ -2227,14 +2210,17 @@ class Log(@volatile private var _dir: File,
* @param segments The log segments to schedule for deletion
* @param asyncDelete Whether the segment files should be deleted 
asynchronously
*/
-  private def removeAndDeleteSegments(segments: Iterable[LogSegment], 
asyncDelete: Boolean): Unit = {
+  private def removeAndDeleteSegments(segments: Iterable[LogSegment],
+  asyncDelete: Boolean,
+  reason: SegmentDeletionReason): Unit = {
 if (segments.nonEmpty) {
   lock synchronized {
 // As most callers hold an iterator into the `segments` collection and 
`removeAndDeleteSegment` mutates it by
 // removing the deleted segment, we should force materialization of 
the iterator here, so that results of the
 // iteration remain valid and deterministic.
 val toDelete = segments.toList
 toDelete.foreach { segment =>
+  info(s"${reason.reasonString(this, segment)}")

Review comment:
   @dhruvilshah3 Sounds good!
   
   @ijuma Good point. I'd guess that also during initial hotset reduction there 
will be an increase in logging, but that will be temporary.
   





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [kafka] kowshik commented on a change in pull request #9110: MINOR: Ensure a reason is logged for every segment deletion

2020-08-01 Thread GitBox


kowshik commented on a change in pull request #9110:
URL: https://github.com/apache/kafka/pull/9110#discussion_r463914087



##
File path: core/src/main/scala/kafka/log/Log.scala
##
@@ -2227,14 +2210,17 @@ class Log(@volatile private var _dir: File,
* @param segments The log segments to schedule for deletion
* @param asyncDelete Whether the segment files should be deleted 
asynchronously
*/
-  private def removeAndDeleteSegments(segments: Iterable[LogSegment], 
asyncDelete: Boolean): Unit = {
+  private def removeAndDeleteSegments(segments: Iterable[LogSegment],
+  asyncDelete: Boolean,
+  reason: SegmentDeletionReason): Unit = {
 if (segments.nonEmpty) {
   lock synchronized {
 // As most callers hold an iterator into the `segments` collection and 
`removeAndDeleteSegment` mutates it by
 // removing the deleted segment, we should force materialization of 
the iterator here, so that results of the
 // iteration remain valid and deterministic.
 val toDelete = segments.toList
 toDelete.foreach { segment =>
+  info(s"${reason.reasonString(this, segment)}")

Review comment:
   If we passed in the deletion reason further into the 
`deleteSegmentFiles` method, it seems we can print the reason string just once 
for a batch of segments being deleted. And within the reason string, we can 
provide the reason for deleting the batch:
   
   
https://github.com/confluentinc/ce-kafka/blob/master/core/src/main/scala/kafka/log/Log.scala#L2519
   
https://github.com/confluentinc/ce-kafka/blob/master/core/src/main/scala/kafka/log/Log.scala#L2526
   
   ex: `info("Deleting segments due to $reason: ${segments.mkString(",")}"`
   
   where `$reason` provides `due to retention time 120ms breach`.
   
   The drawback is that sometimes we can not print segment-specific information 
since the error message is at a batch level. But generally it may be that 
segment-level deletion information could bloat our server logging, so it may be 
better to batch the logging instead.
   
   What are your thoughts?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [kafka] kowshik commented on a change in pull request #9110: MINOR: Ensure a reason is logged for every segment deletion

2020-07-31 Thread GitBox


kowshik commented on a change in pull request #9110:
URL: https://github.com/apache/kafka/pull/9110#discussion_r463914087



##
File path: core/src/main/scala/kafka/log/Log.scala
##
@@ -2227,14 +2210,17 @@ class Log(@volatile private var _dir: File,
* @param segments The log segments to schedule for deletion
* @param asyncDelete Whether the segment files should be deleted 
asynchronously
*/
-  private def removeAndDeleteSegments(segments: Iterable[LogSegment], 
asyncDelete: Boolean): Unit = {
+  private def removeAndDeleteSegments(segments: Iterable[LogSegment],
+  asyncDelete: Boolean,
+  reason: SegmentDeletionReason): Unit = {
 if (segments.nonEmpty) {
   lock synchronized {
 // As most callers hold an iterator into the `segments` collection and 
`removeAndDeleteSegment` mutates it by
 // removing the deleted segment, we should force materialization of 
the iterator here, so that results of the
 // iteration remain valid and deterministic.
 val toDelete = segments.toList
 toDelete.foreach { segment =>
+  info(s"${reason.reasonString(this, segment)}")

Review comment:
   If we passed in the deletion reason further into the 
`deleteSegmentFiles` method, it seems we can print the reason string just once 
for a batch of segments being deleted. And within the reason string, we can 
provide the reason for deleting the batch:
   
   
https://github.com/confluentinc/ce-kafka/blob/master/core/src/main/scala/kafka/log/Log.scala#L2519
   
https://github.com/confluentinc/ce-kafka/blob/master/core/src/main/scala/kafka/log/Log.scala#L2526
   
   ex: `info("Deleting segments due to $reason: ${segments.mkString(",")}"`
   
   where `$reason` provides `due to retention time 120ms breach`.
   
   The drawback is that sometimes we can not print segment-specific information 
since the error message is at a batch level. But generally it may be that 
segment-level deletion information could bloat our server logging, so it may be 
better to batch the logging instead.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org