[jira] [Updated] (KAFKA-615) Avoid fsync on log segment roll
[ https://issues.apache.org/jira/browse/KAFKA-615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jay Kreps updated KAFKA-615: Attachment: KAFKA-615-v7.patch Patch which uses the correct but slow approach of synchronously committing the checkpoint each time we truncate before fetching restarts. Avoid fsync on log segment roll --- Key: KAFKA-615 URL: https://issues.apache.org/jira/browse/KAFKA-615 Project: Kafka Issue Type: Bug Reporter: Jay Kreps Assignee: Neha Narkhede Attachments: KAFKA-615-v1.patch, KAFKA-615-v2.patch, KAFKA-615-v3.patch, KAFKA-615-v4.patch, KAFKA-615-v5.patch, KAFKA-615-v6.patch, KAFKA-615-v7.patch It still isn't feasible to run without an application level fsync policy. This is a problem as fsync locks the file and tuning such a policy so that the flushes aren't so frequent that seeks reduce throughput, yet not so infrequent that the fsync is writing so much data that there is a noticable jump in latency is very challenging. The remaining problem is the way that log recovery works. Our current policy is that if a clean shutdown occurs we do no recovery. If an unclean shutdown occurs we recovery the last segment of all logs. To make this correct we need to ensure that each segment is fsync'd before we create a new segment. Hence the fsync during roll. Obviously if the fsync during roll is the only time fsync occurs then it will potentially write out the entire segment which for a 1GB segment at 50mb/sec might take many seconds. The goal of this JIRA is to eliminate this and make it possible to run with no application-level fsyncs at all, depending entirely on replication and background writeback for durability. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (KAFKA-615) Avoid fsync on log segment roll
[ https://issues.apache.org/jira/browse/KAFKA-615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jay Kreps updated KAFKA-615: Attachment: KAFKA-615-v8.patch Ack, yes, I did mean to fix the recoveryPoint/logEndOffset issue, I just forgot. Attached v8 which includes that. The fix is as you describe--I just reset the recovery point to the end of the log. Avoid fsync on log segment roll --- Key: KAFKA-615 URL: https://issues.apache.org/jira/browse/KAFKA-615 Project: Kafka Issue Type: Bug Reporter: Jay Kreps Assignee: Neha Narkhede Attachments: KAFKA-615-v1.patch, KAFKA-615-v2.patch, KAFKA-615-v3.patch, KAFKA-615-v4.patch, KAFKA-615-v5.patch, KAFKA-615-v6.patch, KAFKA-615-v7.patch, KAFKA-615-v8.patch It still isn't feasible to run without an application level fsync policy. This is a problem as fsync locks the file and tuning such a policy so that the flushes aren't so frequent that seeks reduce throughput, yet not so infrequent that the fsync is writing so much data that there is a noticable jump in latency is very challenging. The remaining problem is the way that log recovery works. Our current policy is that if a clean shutdown occurs we do no recovery. If an unclean shutdown occurs we recovery the last segment of all logs. To make this correct we need to ensure that each segment is fsync'd before we create a new segment. Hence the fsync during roll. Obviously if the fsync during roll is the only time fsync occurs then it will potentially write out the entire segment which for a 1GB segment at 50mb/sec might take many seconds. The goal of this JIRA is to eliminate this and make it possible to run with no application-level fsyncs at all, depending entirely on replication and background writeback for durability. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (KAFKA-615) Avoid fsync on log segment roll
[ https://issues.apache.org/jira/browse/KAFKA-615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jay Kreps updated KAFKA-615: Attachment: KAFKA-615-v6.patch Updated patch: - Removed bad scaladoc - Improved log corruption test to cover corruption in a non-final segment to show that the existing logic works Actually the recoverLog method is right. It loops through the unflushed segments validating them. When it finds a bad one it truncates to the right position in that segment and then loops over all remaining segments and deletes them. The confusing part, I think, is that unflushed is an iterator so unflushed.foreach(deleteSegment) actually ends the loop because a post condition of that is that unflushed.hasNext is false. I agree that is kind of tricky. Not sure if there is a more clear way to do that (I tried, that was what I came up with...wish we had break). Avoid fsync on log segment roll --- Key: KAFKA-615 URL: https://issues.apache.org/jira/browse/KAFKA-615 Project: Kafka Issue Type: Bug Reporter: Jay Kreps Assignee: Neha Narkhede Attachments: KAFKA-615-v1.patch, KAFKA-615-v2.patch, KAFKA-615-v3.patch, KAFKA-615-v4.patch, KAFKA-615-v5.patch, KAFKA-615-v6.patch It still isn't feasible to run without an application level fsync policy. This is a problem as fsync locks the file and tuning such a policy so that the flushes aren't so frequent that seeks reduce throughput, yet not so infrequent that the fsync is writing so much data that there is a noticable jump in latency is very challenging. The remaining problem is the way that log recovery works. Our current policy is that if a clean shutdown occurs we do no recovery. If an unclean shutdown occurs we recovery the last segment of all logs. To make this correct we need to ensure that each segment is fsync'd before we create a new segment. Hence the fsync during roll. Obviously if the fsync during roll is the only time fsync occurs then it will potentially write out the entire segment which for a 1GB segment at 50mb/sec might take many seconds. The goal of this JIRA is to eliminate this and make it possible to run with no application-level fsyncs at all, depending entirely on replication and background writeback for durability. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (KAFKA-615) Avoid fsync on log segment roll
[ https://issues.apache.org/jira/browse/KAFKA-615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jay Kreps updated KAFKA-615: Attachment: KAFKA-615-v5.patch Attach updated patch v5. Rebased against trunk and with added support for compression in the write throughput test. Avoid fsync on log segment roll --- Key: KAFKA-615 URL: https://issues.apache.org/jira/browse/KAFKA-615 Project: Kafka Issue Type: Bug Reporter: Jay Kreps Assignee: Neha Narkhede Attachments: KAFKA-615-v1.patch, KAFKA-615-v2.patch, KAFKA-615-v3.patch, KAFKA-615-v4.patch, KAFKA-615-v5.patch It still isn't feasible to run without an application level fsync policy. This is a problem as fsync locks the file and tuning such a policy so that the flushes aren't so frequent that seeks reduce throughput, yet not so infrequent that the fsync is writing so much data that there is a noticable jump in latency is very challenging. The remaining problem is the way that log recovery works. Our current policy is that if a clean shutdown occurs we do no recovery. If an unclean shutdown occurs we recovery the last segment of all logs. To make this correct we need to ensure that each segment is fsync'd before we create a new segment. Hence the fsync during roll. Obviously if the fsync during roll is the only time fsync occurs then it will potentially write out the entire segment which for a 1GB segment at 50mb/sec might take many seconds. The goal of this JIRA is to eliminate this and make it possible to run with no application-level fsyncs at all, depending entirely on replication and background writeback for durability. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (KAFKA-615) Avoid fsync on log segment roll
[ https://issues.apache.org/jira/browse/KAFKA-615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jay Kreps updated KAFKA-615: Attachment: KAFKA-615-v4.patch Rebased patch to trunk. Avoid fsync on log segment roll --- Key: KAFKA-615 URL: https://issues.apache.org/jira/browse/KAFKA-615 Project: Kafka Issue Type: Bug Reporter: Jay Kreps Assignee: Neha Narkhede Attachments: KAFKA-615-v1.patch, KAFKA-615-v2.patch, KAFKA-615-v3.patch, KAFKA-615-v4.patch It still isn't feasible to run without an application level fsync policy. This is a problem as fsync locks the file and tuning such a policy so that the flushes aren't so frequent that seeks reduce throughput, yet not so infrequent that the fsync is writing so much data that there is a noticable jump in latency is very challenging. The remaining problem is the way that log recovery works. Our current policy is that if a clean shutdown occurs we do no recovery. If an unclean shutdown occurs we recovery the last segment of all logs. To make this correct we need to ensure that each segment is fsync'd before we create a new segment. Hence the fsync during roll. Obviously if the fsync during roll is the only time fsync occurs then it will potentially write out the entire segment which for a 1GB segment at 50mb/sec might take many seconds. The goal of this JIRA is to eliminate this and make it possible to run with no application-level fsyncs at all, depending entirely on replication and background writeback for durability. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (KAFKA-615) Avoid fsync on log segment roll
[ https://issues.apache.org/jira/browse/KAFKA-615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jay Kreps updated KAFKA-615: Attachment: KAFKA-615-v3.patch Patch version v3: - Found a call to flush the index in Log.roll(). Removed this. Avoid fsync on log segment roll --- Key: KAFKA-615 URL: https://issues.apache.org/jira/browse/KAFKA-615 Project: Kafka Issue Type: Bug Reporter: Jay Kreps Assignee: Neha Narkhede Attachments: KAFKA-615-v1.patch, KAFKA-615-v2.patch, KAFKA-615-v3.patch It still isn't feasible to run without an application level fsync policy. This is a problem as fsync locks the file and tuning such a policy so that the flushes aren't so frequent that seeks reduce throughput, yet not so infrequent that the fsync is writing so much data that there is a noticable jump in latency is very challenging. The remaining problem is the way that log recovery works. Our current policy is that if a clean shutdown occurs we do no recovery. If an unclean shutdown occurs we recovery the last segment of all logs. To make this correct we need to ensure that each segment is fsync'd before we create a new segment. Hence the fsync during roll. Obviously if the fsync during roll is the only time fsync occurs then it will potentially write out the entire segment which for a 1GB segment at 50mb/sec might take many seconds. The goal of this JIRA is to eliminate this and make it possible to run with no application-level fsyncs at all, depending entirely on replication and background writeback for durability. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (KAFKA-615) Avoid fsync on log segment roll
[ https://issues.apache.org/jira/browse/KAFKA-615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jay Kreps updated KAFKA-615: Attachment: KAFKA-615-v2.patch New patch with a couple of improvements: 1. Found and fixed a bug in recovery that lead to recovering logs even in clean shutdown case. 2. Now we always resize indexes for all segments during recovery as the index size may change. Not doing this was a bug in the previous patch. 3. Added a unit test that intentionally corrupts a log and checks recovery. I also did some performance testing on my desktop machine. We can sustain very high throughput, but as we approach the maximum throughput of the drive latency will get worse and worse. But as one data point I could do 75Mb/sec sustained writes across 500 logs on a single drive machine that can do a peek of 120MB/sec with avg write latency of 1ms and maximum latency of about 350ms. Avoid fsync on log segment roll --- Key: KAFKA-615 URL: https://issues.apache.org/jira/browse/KAFKA-615 Project: Kafka Issue Type: Bug Reporter: Jay Kreps Assignee: Neha Narkhede Attachments: KAFKA-615-v1.patch, KAFKA-615-v2.patch It still isn't feasible to run without an application level fsync policy. This is a problem as fsync locks the file and tuning such a policy so that the flushes aren't so frequent that seeks reduce throughput, yet not so infrequent that the fsync is writing so much data that there is a noticable jump in latency is very challenging. The remaining problem is the way that log recovery works. Our current policy is that if a clean shutdown occurs we do no recovery. If an unclean shutdown occurs we recovery the last segment of all logs. To make this correct we need to ensure that each segment is fsync'd before we create a new segment. Hence the fsync during roll. Obviously if the fsync during roll is the only time fsync occurs then it will potentially write out the entire segment which for a 1GB segment at 50mb/sec might take many seconds. The goal of this JIRA is to eliminate this and make it possible to run with no application-level fsyncs at all, depending entirely on replication and background writeback for durability. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (KAFKA-615) Avoid fsync on log segment roll
[ https://issues.apache.org/jira/browse/KAFKA-615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jay Kreps updated KAFKA-615: Attachment: KAFKA-615-v1.patch Attached a draft patch for a first version of this for early feedback. A few details remain to work out. This patch removes the per-data-directory .kafka_cleanshutdown file as well as the concept of a clean shutdown. The concept of clean shutdown is replaced with the concept of recovery point. The recovery point is the offset from which the log must be recovered. Recovery points are checkpointed in a per-data-directory file called recovery-point-offset-checkpoint. This uses normal offset checkpoint file format. Previously we always recovered the last log segment unless a clean shutdown was recorded. Now we recover from the recovery point--which may mean recovering many segments. We do not, however, recover partial segments: if the recovery point falls in the middle of a segment we recover that segment from the beginning. On shutdown we force a flush and checkpoint which has the same effect as the cleanshutdown file did before. Deleting the recovery-point-offset-checkpoint file will cause running full recovery on your log on restart which is kind of a nice feature if you suspect any kind of corruption in the log. Log.flush now takes an offset argument and flushes from the recovery point up to the given offset. This allows more granular control to avoid syncing (and hence locking) the active segment. Log.roll() now uses the scheduler to make its flush asynchronous. This flush now only covers up to the segment that is just completed, not the newly created segment, so there should be no locking of the active segment any more. The per-topic flush policy based on # messages and time still remains but now it defaults to off so we rely only on I did some preliminary performance testing and we can indeed run with no application-level flush policy with reasonable latency which is both convenient (no tuning to do) and yields much better throughput. I will do more testing and report results. Avoid fsync on log segment roll --- Key: KAFKA-615 URL: https://issues.apache.org/jira/browse/KAFKA-615 Project: Kafka Issue Type: Bug Reporter: Jay Kreps Assignee: Neha Narkhede Attachments: KAFKA-615-v1.patch It still isn't feasible to run without an application level fsync policy. This is a problem as fsync locks the file and tuning such a policy so that the flushes aren't so frequent that seeks reduce throughput, yet not so infrequent that the fsync is writing so much data that there is a noticable jump in latency is very challenging. The remaining problem is the way that log recovery works. Our current policy is that if a clean shutdown occurs we do no recovery. If an unclean shutdown occurs we recovery the last segment of all logs. To make this correct we need to ensure that each segment is fsync'd before we create a new segment. Hence the fsync during roll. Obviously if the fsync during roll is the only time fsync occurs then it will potentially write out the entire segment which for a 1GB segment at 50mb/sec might take many seconds. The goal of this JIRA is to eliminate this and make it possible to run with no application-level fsyncs at all, depending entirely on replication and background writeback for durability. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira