[jira] [Updated] (KAFKA-615) Avoid fsync on log segment roll

2013-08-05 Thread Jay Kreps (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jay Kreps updated KAFKA-615:


Attachment: KAFKA-615-v7.patch

Patch which uses the correct but slow approach of synchronously committing the 
checkpoint each time we truncate before fetching restarts.

 Avoid fsync on log segment roll
 ---

 Key: KAFKA-615
 URL: https://issues.apache.org/jira/browse/KAFKA-615
 Project: Kafka
  Issue Type: Bug
Reporter: Jay Kreps
Assignee: Neha Narkhede
 Attachments: KAFKA-615-v1.patch, KAFKA-615-v2.patch, 
 KAFKA-615-v3.patch, KAFKA-615-v4.patch, KAFKA-615-v5.patch, 
 KAFKA-615-v6.patch, KAFKA-615-v7.patch


 It still isn't feasible to run without an application level fsync policy. 
 This is a problem as fsync locks the file and tuning such a policy so that 
 the flushes aren't so frequent that seeks reduce throughput, yet not so 
 infrequent that the fsync is writing so much data that there is a noticable 
 jump in latency is very challenging.
 The remaining problem is the way that log recovery works. Our current policy 
 is that if a clean shutdown occurs we do no recovery. If an unclean shutdown 
 occurs we recovery the last segment of all logs. To make this correct we need 
 to ensure that each segment is fsync'd before we create a new segment. Hence 
 the fsync during roll.
 Obviously if the fsync during roll is the only time fsync occurs then it will 
 potentially write out the entire segment which for a 1GB segment at 50mb/sec 
 might take many seconds. The goal of this JIRA is to eliminate this and make 
 it possible to run with no application-level fsyncs at all, depending 
 entirely on replication and background writeback for durability.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (KAFKA-615) Avoid fsync on log segment roll

2013-08-05 Thread Jay Kreps (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jay Kreps updated KAFKA-615:


Attachment: KAFKA-615-v8.patch

Ack, yes, I did mean to fix the recoveryPoint/logEndOffset issue, I just 
forgot. Attached v8 which includes that. The fix is as you describe--I just 
reset the recovery point to the end of the log.

 Avoid fsync on log segment roll
 ---

 Key: KAFKA-615
 URL: https://issues.apache.org/jira/browse/KAFKA-615
 Project: Kafka
  Issue Type: Bug
Reporter: Jay Kreps
Assignee: Neha Narkhede
 Attachments: KAFKA-615-v1.patch, KAFKA-615-v2.patch, 
 KAFKA-615-v3.patch, KAFKA-615-v4.patch, KAFKA-615-v5.patch, 
 KAFKA-615-v6.patch, KAFKA-615-v7.patch, KAFKA-615-v8.patch


 It still isn't feasible to run without an application level fsync policy. 
 This is a problem as fsync locks the file and tuning such a policy so that 
 the flushes aren't so frequent that seeks reduce throughput, yet not so 
 infrequent that the fsync is writing so much data that there is a noticable 
 jump in latency is very challenging.
 The remaining problem is the way that log recovery works. Our current policy 
 is that if a clean shutdown occurs we do no recovery. If an unclean shutdown 
 occurs we recovery the last segment of all logs. To make this correct we need 
 to ensure that each segment is fsync'd before we create a new segment. Hence 
 the fsync during roll.
 Obviously if the fsync during roll is the only time fsync occurs then it will 
 potentially write out the entire segment which for a 1GB segment at 50mb/sec 
 might take many seconds. The goal of this JIRA is to eliminate this and make 
 it possible to run with no application-level fsyncs at all, depending 
 entirely on replication and background writeback for durability.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (KAFKA-615) Avoid fsync on log segment roll

2013-08-04 Thread Jay Kreps (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jay Kreps updated KAFKA-615:


Attachment: KAFKA-615-v6.patch

Updated patch:
- Removed bad scaladoc
- Improved log corruption test to cover corruption in a non-final segment to 
show that the existing logic works

Actually the recoverLog method is right. It loops through the unflushed 
segments validating them. When it finds a bad one it truncates to the right 
position in that segment and then loops over all remaining segments and deletes 
them. The confusing part, I think, is that unflushed is an iterator so 
unflushed.foreach(deleteSegment) actually ends the loop because a post 
condition of that is that unflushed.hasNext is false. I agree that is kind of 
tricky. Not sure if there is a more clear way to do that (I tried, that was 
what I came up with...wish we had break).

 Avoid fsync on log segment roll
 ---

 Key: KAFKA-615
 URL: https://issues.apache.org/jira/browse/KAFKA-615
 Project: Kafka
  Issue Type: Bug
Reporter: Jay Kreps
Assignee: Neha Narkhede
 Attachments: KAFKA-615-v1.patch, KAFKA-615-v2.patch, 
 KAFKA-615-v3.patch, KAFKA-615-v4.patch, KAFKA-615-v5.patch, KAFKA-615-v6.patch


 It still isn't feasible to run without an application level fsync policy. 
 This is a problem as fsync locks the file and tuning such a policy so that 
 the flushes aren't so frequent that seeks reduce throughput, yet not so 
 infrequent that the fsync is writing so much data that there is a noticable 
 jump in latency is very challenging.
 The remaining problem is the way that log recovery works. Our current policy 
 is that if a clean shutdown occurs we do no recovery. If an unclean shutdown 
 occurs we recovery the last segment of all logs. To make this correct we need 
 to ensure that each segment is fsync'd before we create a new segment. Hence 
 the fsync during roll.
 Obviously if the fsync during roll is the only time fsync occurs then it will 
 potentially write out the entire segment which for a 1GB segment at 50mb/sec 
 might take many seconds. The goal of this JIRA is to eliminate this and make 
 it possible to run with no application-level fsyncs at all, depending 
 entirely on replication and background writeback for durability.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (KAFKA-615) Avoid fsync on log segment roll

2013-08-02 Thread Jay Kreps (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jay Kreps updated KAFKA-615:


Attachment: KAFKA-615-v5.patch

Attach updated patch v5. Rebased against trunk and with added support for 
compression in the write throughput test.

 Avoid fsync on log segment roll
 ---

 Key: KAFKA-615
 URL: https://issues.apache.org/jira/browse/KAFKA-615
 Project: Kafka
  Issue Type: Bug
Reporter: Jay Kreps
Assignee: Neha Narkhede
 Attachments: KAFKA-615-v1.patch, KAFKA-615-v2.patch, 
 KAFKA-615-v3.patch, KAFKA-615-v4.patch, KAFKA-615-v5.patch


 It still isn't feasible to run without an application level fsync policy. 
 This is a problem as fsync locks the file and tuning such a policy so that 
 the flushes aren't so frequent that seeks reduce throughput, yet not so 
 infrequent that the fsync is writing so much data that there is a noticable 
 jump in latency is very challenging.
 The remaining problem is the way that log recovery works. Our current policy 
 is that if a clean shutdown occurs we do no recovery. If an unclean shutdown 
 occurs we recovery the last segment of all logs. To make this correct we need 
 to ensure that each segment is fsync'd before we create a new segment. Hence 
 the fsync during roll.
 Obviously if the fsync during roll is the only time fsync occurs then it will 
 potentially write out the entire segment which for a 1GB segment at 50mb/sec 
 might take many seconds. The goal of this JIRA is to eliminate this and make 
 it possible to run with no application-level fsyncs at all, depending 
 entirely on replication and background writeback for durability.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (KAFKA-615) Avoid fsync on log segment roll

2013-07-17 Thread Jay Kreps (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jay Kreps updated KAFKA-615:


Attachment: KAFKA-615-v4.patch

Rebased patch to trunk.

 Avoid fsync on log segment roll
 ---

 Key: KAFKA-615
 URL: https://issues.apache.org/jira/browse/KAFKA-615
 Project: Kafka
  Issue Type: Bug
Reporter: Jay Kreps
Assignee: Neha Narkhede
 Attachments: KAFKA-615-v1.patch, KAFKA-615-v2.patch, 
 KAFKA-615-v3.patch, KAFKA-615-v4.patch


 It still isn't feasible to run without an application level fsync policy. 
 This is a problem as fsync locks the file and tuning such a policy so that 
 the flushes aren't so frequent that seeks reduce throughput, yet not so 
 infrequent that the fsync is writing so much data that there is a noticable 
 jump in latency is very challenging.
 The remaining problem is the way that log recovery works. Our current policy 
 is that if a clean shutdown occurs we do no recovery. If an unclean shutdown 
 occurs we recovery the last segment of all logs. To make this correct we need 
 to ensure that each segment is fsync'd before we create a new segment. Hence 
 the fsync during roll.
 Obviously if the fsync during roll is the only time fsync occurs then it will 
 potentially write out the entire segment which for a 1GB segment at 50mb/sec 
 might take many seconds. The goal of this JIRA is to eliminate this and make 
 it possible to run with no application-level fsyncs at all, depending 
 entirely on replication and background writeback for durability.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (KAFKA-615) Avoid fsync on log segment roll

2013-07-11 Thread Jay Kreps (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jay Kreps updated KAFKA-615:


Attachment: KAFKA-615-v3.patch

Patch version v3:
- Found a call to flush the index in Log.roll(). Removed this.

 Avoid fsync on log segment roll
 ---

 Key: KAFKA-615
 URL: https://issues.apache.org/jira/browse/KAFKA-615
 Project: Kafka
  Issue Type: Bug
Reporter: Jay Kreps
Assignee: Neha Narkhede
 Attachments: KAFKA-615-v1.patch, KAFKA-615-v2.patch, 
 KAFKA-615-v3.patch


 It still isn't feasible to run without an application level fsync policy. 
 This is a problem as fsync locks the file and tuning such a policy so that 
 the flushes aren't so frequent that seeks reduce throughput, yet not so 
 infrequent that the fsync is writing so much data that there is a noticable 
 jump in latency is very challenging.
 The remaining problem is the way that log recovery works. Our current policy 
 is that if a clean shutdown occurs we do no recovery. If an unclean shutdown 
 occurs we recovery the last segment of all logs. To make this correct we need 
 to ensure that each segment is fsync'd before we create a new segment. Hence 
 the fsync during roll.
 Obviously if the fsync during roll is the only time fsync occurs then it will 
 potentially write out the entire segment which for a 1GB segment at 50mb/sec 
 might take many seconds. The goal of this JIRA is to eliminate this and make 
 it possible to run with no application-level fsyncs at all, depending 
 entirely on replication and background writeback for durability.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (KAFKA-615) Avoid fsync on log segment roll

2013-07-08 Thread Jay Kreps (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jay Kreps updated KAFKA-615:


Attachment: KAFKA-615-v2.patch

New patch with a couple of improvements:
1. Found and fixed a bug in recovery that lead to recovering logs even in clean 
shutdown case.
2. Now we always resize indexes for all segments during recovery as the index 
size may change. Not doing this was a bug in the previous patch.
3. Added a unit test that intentionally corrupts a log and checks recovery.

I also did some performance testing on my desktop machine. We can sustain very 
high throughput, but as we approach the maximum throughput of the drive latency 
will get worse and worse.

But as one data point I could do 75Mb/sec sustained writes across 500 logs on a 
single drive machine that can do a peek of 120MB/sec with avg write latency of 
 1ms and maximum latency of about 350ms.

 Avoid fsync on log segment roll
 ---

 Key: KAFKA-615
 URL: https://issues.apache.org/jira/browse/KAFKA-615
 Project: Kafka
  Issue Type: Bug
Reporter: Jay Kreps
Assignee: Neha Narkhede
 Attachments: KAFKA-615-v1.patch, KAFKA-615-v2.patch


 It still isn't feasible to run without an application level fsync policy. 
 This is a problem as fsync locks the file and tuning such a policy so that 
 the flushes aren't so frequent that seeks reduce throughput, yet not so 
 infrequent that the fsync is writing so much data that there is a noticable 
 jump in latency is very challenging.
 The remaining problem is the way that log recovery works. Our current policy 
 is that if a clean shutdown occurs we do no recovery. If an unclean shutdown 
 occurs we recovery the last segment of all logs. To make this correct we need 
 to ensure that each segment is fsync'd before we create a new segment. Hence 
 the fsync during roll.
 Obviously if the fsync during roll is the only time fsync occurs then it will 
 potentially write out the entire segment which for a 1GB segment at 50mb/sec 
 might take many seconds. The goal of this JIRA is to eliminate this and make 
 it possible to run with no application-level fsyncs at all, depending 
 entirely on replication and background writeback for durability.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (KAFKA-615) Avoid fsync on log segment roll

2013-07-06 Thread Jay Kreps (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jay Kreps updated KAFKA-615:


Attachment: KAFKA-615-v1.patch

Attached a draft patch for a first version of this for early feedback. A few 
details remain to work out.

This patch removes the per-data-directory .kafka_cleanshutdown file as well as 
the concept of a clean shutdown. The concept of clean shutdown is replaced 
with the concept of recovery point. The recovery point is the offset from 
which the log must be recovered. Recovery points are checkpointed in a 
per-data-directory file called recovery-point-offset-checkpoint. This uses 
normal offset checkpoint file format.

Previously we always recovered the last log segment unless a clean shutdown was 
recorded. Now we recover from the recovery point--which may mean recovering 
many segments. We do not, however, recover partial segments: if the recovery 
point falls in the middle of a segment we recover that segment from the 
beginning.

On shutdown we force a flush and checkpoint which has the same effect as the 
cleanshutdown file did before.

Deleting the recovery-point-offset-checkpoint file will cause running full 
recovery on your log on restart which is kind of a nice feature if you suspect 
any kind of corruption in the log.

Log.flush now takes an offset argument and flushes from the recovery point up 
to the given offset. This allows more granular control to avoid syncing (and 
hence locking) the active segment.

Log.roll() now uses the scheduler to make its flush asynchronous. This flush 
now only covers up to the segment that is just completed, not the newly created 
segment, so there should be no locking of the active segment any more.

The per-topic flush policy based on # messages and time still remains but now 
it defaults to off so we rely only on 

I did some preliminary performance testing and we can indeed run with no 
application-level flush policy with reasonable latency which is both convenient 
(no tuning to do) and yields much better throughput. I will do more testing and 
report results.

 Avoid fsync on log segment roll
 ---

 Key: KAFKA-615
 URL: https://issues.apache.org/jira/browse/KAFKA-615
 Project: Kafka
  Issue Type: Bug
Reporter: Jay Kreps
Assignee: Neha Narkhede
 Attachments: KAFKA-615-v1.patch


 It still isn't feasible to run without an application level fsync policy. 
 This is a problem as fsync locks the file and tuning such a policy so that 
 the flushes aren't so frequent that seeks reduce throughput, yet not so 
 infrequent that the fsync is writing so much data that there is a noticable 
 jump in latency is very challenging.
 The remaining problem is the way that log recovery works. Our current policy 
 is that if a clean shutdown occurs we do no recovery. If an unclean shutdown 
 occurs we recovery the last segment of all logs. To make this correct we need 
 to ensure that each segment is fsync'd before we create a new segment. Hence 
 the fsync during roll.
 Obviously if the fsync during roll is the only time fsync occurs then it will 
 potentially write out the entire segment which for a 1GB segment at 50mb/sec 
 might take many seconds. The goal of this JIRA is to eliminate this and make 
 it possible to run with no application-level fsyncs at all, depending 
 entirely on replication and background writeback for durability.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira